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Abstract 

We  introduce  a  randomized  procedure  that,  given  an  m  x  n  matrix  A  and  a  positive 
integer  k,  approximates  A  with  a  matrix  Z  of  rank  k.  The  algorithm  relies  on  applying 
a  structured  l  x  m  random  matrix  1Z  to  each  column  of  A,  where  l  is  an  integer  near 
to,  but  greater  than,  k.  The  structure  of  1Z  allows  us  to  apply  it  to  an  arbitrary  mn  x  1 
vector  at  a  cost  proportional  to  m  log(Z) ;  the  resulting  procedure  can  construct  a  rank-A 
approximation  Z  from  the  entries  of  A  at  a  cost  proportional  to  mn  log (A)  +  Z2  (m  +  n). 

We  prove  several  bounds  on  the  accuracy  of  the  algorithm;  one  such  bound  guarantees 
that  the  spectral  norm  ||A  —  Z\\  of  the  discrepancy  between  A  and  Z  is  of  the  same 
order  as  yTnaxj m,  n}  times  the  (A  +  l)st  greatest  singular  value  oy.+i  of  A,  with  small 
probability  of  large  deviations. 

In  contrast,  the  classical  pivoted  UQ R”  decomposition  algorithms  (such  as  Gram- 
Schnridt  or  Householder)  require  at  least  kmn  floating-point  operations  in  order  to 
compute  a  similarly  accurate  rank-A  approximation.  In  practice,  the  algorithm  of  this 
paper  is  faster  than  the  classical  algorithms,  as  long  as  k  is  neither  very  small  nor  very 
large.  Furthermore,  the  algorithm  operates  reliably  independently  of  the  structure  of 
the  matrix  A,  can  access  each  column  of  A  independently  and  at  most  twice,  and 
parallelizes  naturally.  The  results  are  illustrated  via  several  numerical  examples. 

1  Introduction 

In  many  applications  it  is  important  to  be  able  to  construct  low-rank  approximations  to 
matrices.  Such  approximations  help  to  characterize  the  structure  of  linear  operators,  and  to 
facilitate  rapid  calculations  involving  them.  One  classical  form  of  these  approximations  is 
the  singular  value  decomposition  (SVD),  which  is  known  in  the  statistical  literature  as  the 
principal  component  analysis  (PCA).  Another  classical  form  is  the  approximation  obtained 
via  subset  selection;  we  will  refer  to  the  matrix  representation  obtained  via  subset  selection 
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as  an  interpolative  decomposition.  These  two  types  of  matrix  approximations  are  defined  as 
follows. 

An  approximation  to  an  SVD  of  a  complex  m  x  n  matrix  A  consists  of  nonnegative  real 
numbers  ay,  ay,  . . . ,  ay,_i,  &k  known  as  singular  values,  orthonormal  complex  mxl  column 
vectors  u^\  u^2\  . . . ,  u^k~l\  u ^  known  as  left  singular  vectors,  and  orthonormal  complex 
n  x  1  column  vectors  v^\  v (2\  . . . ,  v^k~l\  known  as  right  singular  vectors,  such  that 


k 

A  —  <jj  (v^)* 

3= 1 


<8, 


(1) 


where  k ,  m,  and  n  are  positive  integers  with  k  <  m  and  k  <  n,  8  is  a  positive  real  number 
specifying  the  precision  of  the  approximation,  and,  for  any  matrix  P,  ||P||  denotes  the 
spectral  (72-operator)  norm  of  B ,  that  is,  ||P||  is  the  greatest  singular  value  of  B.  An 
approximation  to  an  SVD  of  A  is  often  written  in  the  equivalent  form 


A&U  sr,  (2) 

where  U  is  a  complex  m  x  k  matrix  whose  columns  are  orthonormal,  V  is  a  complex  n  x  k 
matrix  whose  columns  are  orthonormal,  and  E  is  a  real  diagonal  k  x  k  matrix  whose  entries 
are  all  nonnegative.  See,  for  example,  [14]  for  a  more  detailed  discussion  of  SVDs. 

An  interpolative  decomposition  of  a  complex  m  x  n  matrix  A  consists  of  a  complex  m  x  k 
matrix  B  whose  columns  constitute  a  subset  of  the  columns  of  A,  and  a  complex  k  x  n  matrix 
P,  such  that 

1.  some  subset  of  the  columns  of  P  makes  up  the  k  x  k  identity  matrix, 

2.  no  entry  of  P  has  an  absolute  value  greater  than  2,  and 

3.  A  =  BP. 

See,  for  example,  [12],  [5],  [13],  [9],  [18],  or  Sections  4  and  5  of  [4]  for  a  discussion  of 
interpolative  decompositions. 

The  present  article  introduces  an  algorithm  for  the  computation  of  a  low-rank  approxi¬ 
mation  of  either  type  to  an  arbitrary  matrix.  The  algorithm  is  generally  at  least  as  efficient 
as  pivoted  Gram-Schmidt  and  the  other  classical  pivoted  “Q  P”  decomposition  algorithms, 
and  often  substantially  more  efficient.  In  order  to  construct  a  nearly  optimal  rank- A;  approx¬ 
imation  to  a  complex  n  x  n  matrix,  any  of  the  standard  schemes  (such  as  Gram-Schmidt  or 
Householder)  requires  at  least 

0{kn2)  (3) 

floating-point  operations  (see,  for  example,  Chapter  5  in  [8]).  In  contrast,  the  algorithm  of 
Subsection  5.2  of  the  present  paper  requires 

0{n2  log  (A;)  +  nl2)  (4) 

floating-point  operations,  where  l  is  an  integer  near  to,  but  greater  than,  k. 

In  practice,  the  scheme  of  the  present  article  is  more  efficient  than  standard  methods 
whenever  l  <  n  and  k  is  not  extremely  small  (see  Section  6  below).  Furthermore,  the  scheme 
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of  the  present  paper  requires  less  storage  whenever  the  input  matrix  is  to  be  preserved, 
applies  naturally  to  matrices  whose  entries  are  to  be  evaluated  on-the-fly,  rather  than  stored 
in  memory,  and  parallelizes  trivially.  Thus,  the  algorithm  described  below  would  seem  to 
be  preferable  to  the  classical  algorithms  for  the  construction  of  low-rank  approximations  to 
medium-  and  large-scale  dense  matrices,  or  (more  or  less  equivalently)  for  the  computation 
of  a  few  of  the  greatest  singular  values  of  matrices  and  their  corresponding  singular  vectors. 

Unlike  the  classical  algorithms,  the  scheme  of  the  present  paper  is  a  randomized  one,  and 
fails  with  a  small  probability.  However,  one  can  determine  rapidly  whether  the  algorithm 
has  succeeded,  using  a  verification  scheme  such  as  that  described  in  Subsection  3.4.  If 
the  algorithm  were  to  fail,  then  one  could  run  the  algorithm  again  with  an  independent 
realization  of  the  random  variables  involved,  in  effect  boosting  the  probability  of  success  at 
a  reasonable  additional  expected  cost.  In  fact,  the  randomized  algorithm  succeeded  during 
every  trial  reported  in  the  numerical  experiments  of  Section  6,  obviating  the  need  to  run  the 
algorithm  again. 

The  algorithm  of  [13]  is  similar  to  the  algorithm  of  the  present  paper.  The  core  steps 
of  both  algorithms  involve  the  rapid  computation  of  the  product  of  a  random  matrix  and 
the  matrix  to  be  approximated.  The  algorithm  of  [13]  assumes  that  the  matrix  to  be  ap¬ 
proximated  (and  its  transpose)  can  be  applied  rapidly  to  arbitrary  vectors,  thus  enabling 
the  rapid  computation  of  the  product  of  the  matrix  to  be  approximated  (or  its  transpose) 
and  any  matrix.  In  contrast,  the  algorithm  of  the  present  paper  utilizes  a  random  matrix  TZ 
which  can  be  applied  rapidly  to  arbitrary  vectors,  thus  enabling  the  rapid  computation  of 
the  product  of  7 Z  and  any  matrix. 

The  matrix  TZ  employed  in  the  present  paper  consists  of  several  uniformly  randomly  se¬ 
lected  rows  of  the  product  of  a  discrete  Fourier  transform  matrix  and  a  random  diagonal 
matrix.  The  fast  Fourier  transform  and  similar  algorithms  allow  the  rapid  application  of 
TZ  to  arbitrary  vectors  (see,  for  example,  [14]  for  a  discussion  of  the  fast  Fourier  transform 
algorithm  and  its  applications).  The  idea  of  using  a  random  matrix  with  such  structure  has 
been  introduced  in  [1],  The  idea  of  using  such  a  matrix  in  numerical  linear  algebra  (specifi¬ 
cally,  for  the  purpose  of  computing  a  solution  in  the  least-squares  sense  to  an  overdetermined 
system  of  linear-algebraic  equations)  has  been  introduced  in  [15],  utilizing  both  [1]  and  [7]. 

It  should  be  observed  that  there  is  nothing  magical  about  our  choice  of  the  matrix  TZ. 
In  our  experience,  several  other  constructions  work  just  as  well;  for  example,  the  Fourier 
transform  utilized  in  the  present  paper  can  be  replaced  with  the  Walsh-Hadamard  transform 
(see  [1]  or  [19]).  We  are  investigating  several  possible  alternatives  (see  [2]).  For  simplicity, 
we  discuss  here  only  complex  matrices;  our  preliminary  report  [19]  discusses  an  early  version 
of  the  algorithm  tailored  for  real  matrices. 

The  present  paper  has  the  following  structure:  Section  2  sets  the  notation.  Section  3 
collects  together  various  known  facts  which  later  sections  utilize.  Section  4  provides  the 
principal  lemmas  which  Section  5  uses  to  construct  algorithms.  Section  5  describes  the 
algorithm  of  the  present  paper,  providing  details  about  its  accuracy  and  computational 
costs.  Section  6  illustrates  the  performance  of  the  algorithm  via  several  numerical  examples. 
Section  7  draws  several  conclusions  and  proposes  directions  for  further  work. 
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2  Notation 


In  this  section,  we  set  notational  conventions  employed  throughout  the  present  paper. 

We  denote  an  identity  matrix  by  1,  and  a  matrix  whose  entries  are  all  zeros  by  0.  For 
any  matrix  A,  we  define  the  norm  ||kl||  of  A  to  be  the  spectral  (P-operator)  norm  of  A,  that 
is,  \\A\\  is  the  greatest  singular  value  of  A.  For  any  matrix  A,  we  define  A*  to  be  the  adjoint 
of  A.  For  any  complex  number  z,  we  define  z  to  be  the  conjugate  of  z.  We  use  i  =  \/— T 
and  e  =  exp(l).  For  any  nonnegative  integers  n  and  m,  we  define  l  =  (n  mod  m)  to  be  the 
integer  /  such  that  n  —  l  is  a  multiple  of  m  and  0  <  l  <  m  —  1,  that  is,  n  mod  m  is  the 
remainder  after  integer  division  of  n  by  m.  We  use  P  to  take  the  probability  of  an  event,  and 
E  to  take  the  expectation  of  a  random  variable.  We  abbreviate  “independent,  identically 
distributed”  to  “i.i.d.” 

For  any  positive  integer  m,  we  define  the  unnormalized  discrete  Fourier  transform  P*™) 
to  be  the  complex  m  x  m  matrix  with  the  entry 

(. F{m))j:k  =  e-2«i(j-i)(fc-i  )/m  (5) 

for  j,  k  —  1,  2,  . . . ,  m  —  1,  m;  if  the  size  m  is  clear  from  the  context,  then  we  omit  the 
superscript  in  F^m\  denoting  the  unnormalized  discrete  Fourier  transform  by  simply  F. 

We  will  frequently  utilize  the  following  subsampled  randomized  Fourier  transform.  For 
any  positive  integers  l  and  m  with  /  <  m,  we  define  the  l  x  m  SRFT  to  be  the  complex  l  x  m 
random  matrix 

TZ  =  SFD.  (6) 

In  (6),  S  is  the  l  x  m  matrix  whose  entries  are  all  zeros,  aside  from  a  single  1  in  column  Sj  of 
row  j  for  j  —  1,  2,  . . . ,  l  —  1,  /,  where  Si,  s2,  ■ . . ,  s/_ i,  Si  are  i.i.d.  integer  random  variables, 
each  distributed  uniformly  over  {1,  2, ... ,  m  —  1,  m}.  Moreover,  F  is  the  mxm  unnormalized 
discrete  Fourier  transform,  and  D  is  the  diagonal  m  x  m  matrix  whose  diagonal  entries  d\,  d2, 

. . . ,  dm- 1 ,  dm  are  i.i.d.  complex  random  variables,  each  distributed  uniformly  over  the  unit 
circle.  We  call  1Z  an  “SRFT”  for  lack  of  a  better  term. 


3  Preliminaries 

In  this  section,  we  summarize  various  facts  from  linear  algebra,  and  describe  efficient  algo¬ 
rithms  for  computing  an  arbitrary  subset  of  the  outputs  of  a  discrete  Fourier  transform,  as 
well  as  for  identifying  matrices  whose  spectral  norms  are  larger  than  desired. 

3.1  General  facts  from  linear  algebra 

In  this  subsection,  we  summarize  various  general  facts  from  linear  algebra. 

The  following  lemma  states  that,  for  any  m  x  n  matrix  A  whose  rank  is  k,  there  exist  an 
m  x  k  matrix  B  whose  columns  constitute  a  subset  of  the  columns  of  A,  and  a  kxn  matrix 
P,  such  that 

1.  some  subset  of  the  columns  of  P  makes  up  the  k  x  k  identity  matrix, 
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2.  P  is  not  too  large,  and 

3.  B  P  =  A. 

Moreover,  the  lemma  provides  an  analogous  approximation  B  P  to  A  when  the  exact  rank 
of  A  is  not  k,  but  the  (k  +  l)st  singular  value  of  A  is  nevertheless  small.  The  lemma  is  a 
reformulation  of  Theorem  3.2  in  [12]  and  Theorem  3  in  [5]. 

Lemma  3.1  Suppose  that  m  and  n  are  positive  integers,  and  A  is  a  complex  m  x  n  matrix. 

Then,  for  any  positive  integer  k  with  k  <  m  and  k  <  n,  there  exist  a  complex  k  x  n 
matrix  P,  and  a  complex  m  x  k  matrix  B  whose  columns  constitute  a  subset  of  the  columns 
of  A,  such  that 

1.  some  subset  of  the  columns  of  P  makes  up  the  k  x  k  identity  matrix, 

2.  no  entry  of  P  has  an  absolute  value  greater  than  1, 

3.  ||P||  <  y/k{n-k)  +  l, 

4 ■  the  least  ( that  is,  the  kth  greatest)  singular  value  of  P  is  at  least  1, 

5.  B  P  =  A  when  k  —  m  or  k  =  n,  and 

6.  || B  P  —  A\\  <  yfk  {n  —  k)  +  1  cq+i  when  k  <  m  and  k  <  n,  where  cq+i  is  the  ( k  +  l)st 
greatest  singular  value  of  A. 

Remark  3.2  Properties  1,  2,  3,  and  4  in  Lemma  3.1  ensure  that  the  interpolative  decom¬ 
position  B  P  of  A  is  numerically  stable.  Also,  Property  3  follows  directly  from  Properties  1 
and  2,  and  Property  4  follows  directly  from  Property  1. 

Observation  3.3  There  exists  an  algorithm  which  computes  B  and  P  in  Lemma  3.1  from 
A,  provided  that  we  require  only  that 

1.  some  subset  of  the  columns  of  P  makes  up  the  k  x  k  identity  matrix, 

2.  no  entry  of  P  has  an  absolute  value  greater  than  2, 

3.  ||P||  <  \J Ak  (n-k)  +  1, 

4.  the  least  (that  is,  the  kth  greatest)  singular  value  of  P  is  at  least  1, 

5.  B  P  =  A  when  k  —  rn  or  k  —  n,  and 

6.  ||P  P  —  A||  <  y/- 4 k  (n  —  k)  +  1  oq+i  when  k  <  m  and  k  <  n,  where  cq ;+i  is  the  (k  +  l)st 
greatest  singular  value  of  A. 

For  any  positive  real  number  e,  the  algorithm  can  identify  the  least  k  such  that  ||P  P— A||  «  e. 
Furthermore,  there  exists  a  real  number  C  such  that  the  algorithm  computes  both  B  and 
P  using  at  most  Ckmn  log(n)  floating-point  operations  and  Cmn  floating-point  words  of 
memory.  The  algorithm  is  based  upon  the  Cramer  rule  and  the  ability  to  obtain  the  minimal- 
norm  (or  at  least  roughly  minimal-norm)  solutions  to  linear-algebraic  systems  of  equations 
(see  [12],  [5],  and  [10]). 
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The  following  lemma  provides  an  approximation  Q  Z  to  an  n  x  l  matrix  Y  via  an  n  x  k 
matrix  Q  whose  columns  are  orthonormal,  and  a  k  x  /  matrix  Z . 

Lemma  3.4  Suppose  that  k,  l,  and  n  are  positive  integers  with  k  <  l  <  n,  and  Y  is  a 
complex  n  x  l  matrix. 

Then,  there  exist  a  complex  n  x  k  matrix  Q  whose  columns  are  orthonormal,  and  a 
complex  k  x  l  matrix  Z ,  such  that 


\\QZ-Y\\  <  rjk+l,  (7) 

where  gk+i  is  the  (. k  +  l)s*  greatest  singular  value  ofY . 

Proof.  We  start  by  forming  an  SVD  of  Y . 

Y  =  U  sr,  (8) 


where  U  is  a  complex  n  x  l  matrix  whose  columns  are  orthonormal,  V  is  a  complex  /  x  l 
matrix  whose  columns  are  orthonormal,  and  E  is  a  real  diagonal  l  x  l  matrix  whose  entries 
are  all  nonnegative,  such  that 

^j,j  =  Vj  (9) 

for  j  —  1,  2,  —  1,  /,  where  E jj  is  the  entry  in  row  j  and  column  j  of  E,  and  77 j  is  the 

jth  greyest  singular  value  of  Y.  We  define  Q  to  be  the  leftmost  n  x  k  block  of  U,  and  P  to 
be  the  rightmost  n  x  (/  —  k)  block  of  U,  so  that 


U=(Q\P).  (10) 

We  define  Z  to  be  the  uppermost  k  x  l  block  of  E  V* ,  and  X  to  be  the  lowermost  (/  —  k)  x  l 

block  of  EP*,  so  that 

=(-?-).  (11) 

Combining  (8),  (9),  (10),  (11),  and  the  fact  that  the  columns  of  U  are  orthonormal,  as  are 
the  columns  of  V,  yields  (7).  □ 


Observation  3.5  In  order  to  compute  the  matrices  Q  and  Z  in  (7)  from  the  matrix  Y ,  we 
can  construct  (8),  and  then  form  Q  and  Z  according  to  (10)  and  (11).  (See,  for  example, 
Chapter  8  in  [8]  for  details  concerning  the  computation  of  the  SVD.) 

The  following  technical  lemma  will  be  needed  in  Section  4;  Lemma  6  of  [13]  provides  a 
proof. 

Lemma  3.6  Suppose  that  k  and  l  are  positive  integers  with  k  <  l.  Suppose  further  that  G 
is  a  complex  l  x  k  matrix  such  that  G*  G  is  invertible. 

Then, 

||(G'*  G)~l  G*||  =  — ,  (12) 

where  ak  is  the  least  ( that  is,  the  kth  greatest)  singular  value  of  G. 
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3.2  More  specialized  facts  from  linear  algebra 

In  this  subsection,  we  summarize  various  facts  from  linear  algebra  that  are  useful  specifically 
for  the  randomized  approximation  of  matrices. 

The  following  lemma  states  that  the  product  BP  of  matrices  B  and  P  is  a  good  approx¬ 
imation  to  a  matrix  A,  provided  that  there  exists  a  matrix  77  such  that 

1.  the  columns  of  B  constitute  a  subset  of  the  columns  of  A, 

2.  || P ||  is  not  too  large, 

3.  1ZB  P  is  a  good  approximation  to  77  A,  and 

4.  there  exists  a  matrix  T  such  that  ||T||  is  not  too  large,  and  T1ZA  is  a  good  approxi¬ 
mation  to  A. 

Lemma  3.7  Suppose  that  k,  l,  m,  and  n  are  positive  integers  with  k  <  n.  Suppose  further 
that  A  is  a  complex  m  x  n  matrix,  B  is  a  complex  m  x  k  matrix  whose  columns  constitute  a 
subset  of  the  columns  of  A,  P  is  a  complex  k  x  n  matrix,  T  is  a  complex  m  x  l  matrix,  and 
77  is  a  complex  l  x  m  matrix. 

Then, 

\\BP-A\\  <  \\T1ZA  —  A\\  (||P||  +  1)  +  ||T||  \\KBP-KA\\.  (13) 

Proof.  We  observe  that 

\\BP-  A\\  <\\BP  -TKBP\\  +  \\TKBP  -TKA\\  +  \\TKA-  A\\,  (14) 

\\BP -TPBP\\<\\B -TTZB\\\\P\\,  (15) 

and 

\\T11BP -THA\\  <  ||T||  \\KBP-KA\\.  (16) 

Since  the  columns  of  B  constitute  a  subset  of  the  columns  of  A,  it  follows  that  the  columns 
of  B  —  T1ZB  constitute  a  subset  of  the  columns  of  A  —  T1ZA,  and  therefore, 

\\B-TKB\\<\\A-TKA\\.  (17) 

Combining  (14),  (15),  (16),  and  (17)  yields  (13).  □ 

Remark  3.8  Since  the  columns  of  B  constitute  a  subset  of  the  columns  of  A  in  Lemma  3.7, 
it  follows  that  the  columns  of  77  B  constitute  a  subset  of  the  columns  of  77  A.  Conversely, 
whenever  a  matrix  Z  is  formed  by  gathering  distinct  columns  of  Y  —  PA  together  into  Z , 
then  clearly  Z  =  77  B  for  some  matrix  B  whose  columns  constitute  a  subset  of  the  columns 
of  A 

The  following  lemma  states  that  the  product  A  Q  Q*  of  matrices  A,  Q,  and  Q*  is  a  good 
approximation  to  a  matrix  A,  provided  that  there  exist  matrices  77  and  Z  such  that 

1.  the  columns  of  Q  are  orthonormal, 
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2.  Q  Z  is  a  good  approximation  to  (HA)*,  and 

3.  there  exists  a  matrix  T  such  that  ||T||  is  not  too  large,  and  T  77  71  is  a  good  approxi¬ 
mation  to  A. 

Lemma  17  of  [13]  provides  a  proof  for  the  following  lemma. 

Lemma  3.9  Suppose  that  k,  l,  m,  and  n  are  positive  integers  with  k  <  n.  Suppose  further 
that  A  is  a  complex  mxn  matrix,  Q  is  a  complex  nxk  matrix  whose  columns  are  orthonormal, 
Z  is  a  complex  k  x  l  matrix,  T  is  a  complex  m  x  l  matrix,  and  1Z  is  a  complex  l  x  m  matrix. 
Then, 

\\AQQ*-A\\  <2\\T11A-A\\+2\\T\\  \\Q  Z  -  (K  A)*\\.  (18) 

The  following  lemma  provides  an  efficient  means  of  computing  an  SVD  of  a  complex 
m  x  n  matrix  A  from  a  complex  m  x  k  matrix  B  and  a  complex  k  x  n  matrix  P  such  that 
A  =  B  P  and  k  is  much  less  than  both  m  and  n.  If,  in  addition,  ||P||  <  ||7L||  and  ||P||  is  not 
too  large,  then  the  scheme  described  by  the  lemma  is  numerically  stable.  We  observe  that, 
if  B  and  P  arise  from  an  interpolative  decomposition,  then  indeed  ||P||  <  ||2l||  and  ||P||  is 
not  too  large,  and  so  the  scheme  described  by  the  lemma  is  numerically  stable. 

Lemma  3.10  Suppose  that  k,  m,  and  n  are  positive  integers  with  k  <  m  and  k  <  n. 
Suppose  further  that  A  is  a  complex  mxn  matrix,  B  is  a  complex  m  x  k  matrix,  and  P  is 
a  complex  k  x  n  matrix,  such  that 

A  =  B  P.  (19) 

Suppose  in  addition  that  L  is  a  complex  kxk  matrix,  and  Q  is  a  complex  nxk  matrix  whose 
columns  are  orthonormal,  such  that 

P  =  LQ*.  (20) 


Suppose  finally  that  C  is  a  complex  mxk  matrix,  U  is  a  complex  mxk  matrix  whose  columns 
are  orthonormal,  E  is  a  real  kxk  matrix,  and  W  is  a  complex  kxk  matrix  whose  columns 
are  orthonormal,  such  that 


C  =  BL 

(21) 

and 

C  =  UT,W*. 

(22) 

Then, 

a  =  u  sr, 

(23) 

where  V  is  the  complex  nxk  matrix  given  by  the  formula 

V  =  QW. 

(24) 

Moreover,  the  columns  ofV  are  orthonormal  (as  are  the  columns  ofU),  and 


(25) 


ill  =  Ill’ll- 


Proof.  Combining  (19),  (20),  (21),  (22),  and  (24)  yields  (23).  Combining  (24)  and  the  facts 
that  W  is  unitary  and  that  the  columns  of  Q  are  orthonormal  yields  that  the  columns  of 
V  are  orthonormal.  Combining  (20)  and  the  fact  that  the  columns  of  Q  are  orthonormal 
yields  (25).  □ 


Remark  3.11  The  matrices  L  and  Q  in  (20)  can  be  computed  from  P  as  follows.  Using 
the  algorithms  described,  for  example,  in  Chapter  5  of  [8],  we  construct  an  upper  triangular 
complex  k  x  k  matrix  R,  and  a  complex  n  x  k  matrix  Q  whose  columns  are  orthonormal, 
such  that 

P*  =  Q  R.  (26) 

We  thus  obtain  Q.  We  then  define  L  to  be  the  adjoint  of  R,  that  is, 

L  —  R*.  (27) 


3.3  An  accelerated  fast  Fourier  transform 

In  this  subsection,  we  describe  an  efficient  algorithm  for  computing  an  arbitrary  subset  of 
the  outputs  of  a  discrete  Fourier  transform,  based  on  the  fast  Fourier  transform  (see,  for 
example,  [14]  for  a  discussion  of  the  fast  Fourier  transform  algorithm  and  its  applications). 
The  algorithm  requires  0(n  log(/))  floating-point  operations  in  order  to  compute  l  samples 
of  the  discrete  Fourier  transform  of  a  vector  of  length  n.  [17]  discusses  an  extremely  similar 
algorithm. 

The  following  lemma  is  easily  verified  by  identifying  the  summation  indices  k,  k±,  and  k2 
via  the  equation  k  =  m{k\  —  1)  +  k2. 

Lemma  3.12  Suppose  that  l,  m,  and  n  are  positive  integers  with  n  —  l  ■  rn,  and  v  is  a 
complex  n  x  1  column  vector. 

Then, 

n 

^  e-27u(j-l)(fc-l)/n  ^ 

k= 1 

m  l 

—  'Y'  .  e-2ni(j2-l){k2-l)/n  _  Vm^kl_i)+k.  (28) 

&2— 1  ki=l 

for  ji  =  1,  2,  m-  1,  m  and  j2  =  1,  2,  . . . ,  l  -  1,  l,  where  j  =  l(ji  -  1)  +  j2. 

Suppose  that  /,  m,  and  n  are  positive  integers  with  n  —  l  ■  m,  and  v  and  z  are  complex 
n  x  1  column  vectors,  such  that  z  =  F(n'>  v,  where  F ^  is  the  n  x  n  unnormalized  discrete 
Fourier  transform.  Then,  it  follows  from  (28)  that  the  following  procedure  computes  z  from 
v: 


1.  Viewing  the  vector  v  as  an  l  x  m  matrix  V  stored  in  row- major  order,  form  the  product 
W  of  the  l  x  /  unnormalized  discrete  Fourier  transform  and  V,  so  that 


W  =  F(0  V. 


(29) 
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2.  Multiply  the  entry  in  row  j  and  column  k  of  W  by  e  2m^  bO  b/n  for  j  —  1,2,..., 
1  —  1,1  and  k  —  1,2,  . . . ,  m  —  1,  m,  in  order  to  obtain  the  /  x  m  matrix  X. 

3.  Transpose  X  to  obtain  an  m  x  l  matrix  Y,  so  that 

Y  =  XT.  (30) 

4.  Form  the  product  Z  of  the  m  x  m  unnormalized  discrete  Fourier  transform  Fl'rn'!  and 
Y,  so  that 

Z  =  F(m)  Y.  (31) 

View  the  m  x  l  matrix  Z  as  a  vector  z  stored  in  row-major  order. 

If  we  only  need  to  compute  l  entries  of  z  —  F(n)  v,  then  we  can  use  Steps  1-3  above  in 
their  entirety  to  obtain  Y,  and  then  compute  the  desired  entries  of  z  directly  from  the  entries 
of  Y.  Step  1  costs  0(m  ■  l  log (/))  using  the  fast  Fourier  transform,  Step  2  costs  0(m  ■  /),  and 
Step  3  costs  0(m  ■  /).  It  follows  from  (31)  that  each  entry  of  z  is  a  linear  combination  of  m 
entries  of  Y.  Therefore,  computing  the  /  desired  entries  of  z  directly  from  the  entries  of  Y 
costs  0(1  ■  m). 

Summing  up  these  costs  and  using  the  fact  that  l  ■  m  —  n,  we  find  that  computing  any 
specified  l  entries  of  z  —  F ^  v  from  the  entries  of  v  costs 

Cioin  =  0(n  log(/))  (32) 


floating-point  operations. 

3.4  A  randomized  scheme  for  estimating  the  spectral  norm  of  a 
matrix 

In  this  subsection,  we  formalize  the  intuitively  obvious  statement  that  the  product  Ax  of  a 
matrix  A  and  a  random  vector  x  has  a  small  norm  whenever  ||4||  is  small.  Moreover,  ||4a;|| 
is  only  rarely  not  small  whenever  ||4||  is  not  small.  By  applying  A  to  a  short  sequence  of 
independent  vectors  x^,  . . . ,  x^k~l\  x^  and  looking  at  the  results,  we  can  estimate 
||  A  ||  with  very  high  probability  and  acceptable  accuracy. 

Needless  to  say,  other  estimates  of  this  type  have  been  constructed  previously;  those 
in  [3]  are  some  of  the  best-known  ones.  The  estimates  of  [3]  are  different  from  ours  in  several 
respects,  most  notably  in  that  we  work  in  the  Euclidean  norm,  while  the  authors  of  [3] 
work  in  the  max  norm;  in  addition,  the  entries  of  our  vectors  x^\  x^2\  . . . ,  x ^  are 

Gaussian  random  variables,  while  in  [3]  the  entries  are  chosen  uniformly  at  random  from 

{-Mi- 

Theorem  3.15  below  is  the  principal  purpose  of  this  subsection;  we  start  with  two  technical 
lemmas.  The  following  lemma  provides  an  expression  for  the  probability  that  a  certain  trial 
will  succeed  several  times,  in  terms  of  the  probability  that  the  trial  will  succeed  once. 

Lemma  3.13  Suppose  that  p  is  a  positive  real  number,  k,  m,  and  n  are  positive  integers, 
A  is  a  complex  m  x  n  matrix,  and  x  and  M),  x^2\  . . . ,  x aM  are  n  x  1  i.i.d.  random 
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vectors  with  i.i.d.  entries,  each  distributed  as  a  complex  Gaussian  random  variable  of  zero 
mean  and  unit  variance. 

Then, 


||  Ax 


O')  I 


||xW|| 

Proof.  Clearly, 

|| 


<  p  ||A||  for  all  j  =  1,2, . . . ,  k  —  1,  k  >  =  (  P 


||Aa:| 


x 


<  t\\M 


\x 


0)1 


<  p  ||A||  for  all  j  —  1,  2, . . . ,  k  —  1,  k  >  =  P  |  |^| 


||  Ax 


0)1 


'll  11^1 


It  follows  from  the  independence  of  x^2\  . . . ,  xfk  l\  that 


p(Q{w 


||  Ax 


0)1 


\x 


0)1 


<f*\\A\\ 


(33) 


<hPII  •  (34) 


(35) 


It  follows  from  the  fact  that  a^1),  x^2\  . . . ,  x^k  jfG  are  all  distributed  the  same  as  x  that 


P 


pi® 


<  p\\A\\ 


<t\\a\\ 


(36) 


for  j  =  1,  2,  . . . ,  k  —  1,  k.  Combining  (34),  (35),  and  (36)  yields  (33). 


□ 


Given  a  matrix  A  and  a  random  vector  x,  the  following  lemma  estimates  the  probability 
that  1 1 A  x 1 1  is  small  compared  to  ||A||  •  ||a;||.  Theorem  1  in  [6]  provides  another  formulation 
of  this  lemma. 


Lemma  3.14  Suppose  that  p  is  a  positive  real  number,  m  and  n  are  positive  integers,  A  is 
a  complex  m  x  n  matrix,  and  x  is  an  nx  1  random  vector  with  i.i.d.  entries,  each  distributed 
as  a  complex  Gaussian  random  variable  of  zero  mean  and  unit  variance. 

Then, 

<  /a  ||A||  |  <  0.8  p  \fn. 

The  following  theorem  provides  an  efficient  means  for  testing  whether  the  spectral  norm 
of  a  matrix  exceeds  a  user-specified  threshold. 

Theorem  3.15  Suppose  that  p  is  a  positive  real  number,  k,  m,  and  n  are  positive  integers, 
A  is  a  complex  m  x  n  matrix,  and  x^2\  . . . ,  x^k~l\  :/fk)  are  nx  1  i.i.d.  random  vectors 
with  i.i.d.  entries,  each  distributed  as  a  complex  Gaussian  random  variable  of  zero  mean  and 
unit  variance. 

Then, 


pf  WAxU) 

\  ||zG)|| 


<  p  ||A||  for  every  j 


(38) 
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Proof.  Combining  (33)  and  (37)  yields  (38).  □ 

We  now  consider  an  application  of  Theorem  3.15. 

On  the  one  hand,  suppose  that  £  is  a  positive  real  number,  m  and  n  are  positive  integers, 
and  A  is  a  complex  m  x  n  matrix,  such  that 

PH>80Vne.  (39) 


To  ascertain  computationally  that  A  has  a  spectral  norm  greater  than  e,  we  apply  A  to  half 
a  dozen  n  x  1  i.i.d.  random  vectors  x^2\  x^3\  x^,  x^5\  x^  with  i.i.d.  entries,  each 
distributed  as  a  complex  Gaussian  random  variable  of  zero  mean  and  unit  variance.  We  then 
check  whether  at  least  one  of  the  numbers 

llTLx^H  ||kla;(2)||  ||^4x^||  ||^4x^||  ||Ho;(5)||  ||t4x^|| 

llaaO)  II  ’  ||a;(2)||  ’  |M3)||  ’  |M4)||  ’  |M5)||  ’  IM6UI  ^ 


is  at  least  e.  Combining  (39)  and  (38)  with  n  =  yields  that 


<  £  for  every  j  =  1,  2,  3, 4,  5,  6  >  <  10 


Thus,  we  are  unlikely  to  find  that  all  of  the  numbers  (40)  are  less  than  e.  Obviously,  if  the 
spectral  norm  of  A  is  even  greater  than  SOy7^  e,  then  we  are  even  less  likely  to  find  that  all 
of  the  numbers  (40)  are  less  than  e. 

On  the  other  hand,  suppose  that  £  is  a  positive  real  number,  m  and  n  are  positive  integers, 
and  A  is  a  complex  m  x  n  matrix,  such  that 


HI  <  e.  (42) 

We  apply  A  to  half  a  dozen  n  x  1  i.i.d.  random  vectors  a^1),  x^2\  x^\  x^ ,  x^5\  x^  with 
i.i.d.  entries,  each  distributed  as  a  complex  Gaussian  random  variable  of  zero  mean  and  unit 
variance.  Then,  (42)  guarantees  that  all  of  the  numbers  (40)  will  be  less  than  e. 

Hence,  by  applying  matrices  to  a  few  random  vectors,  we  can  with  very  high  probability 
filter  out  those  matrices  whose  spectral  norms  are  significantly  larger  than  a  user-specified 
precision  e,  while  always  passing  all  of  the  matrices  whose  spectral  norms  are  less  than  e. 
Thus,  we  can  efficiently  and  reliably  identify  matrices  whose  spectral  norms  are  larger  than 
desired,  even  when  we  cannot  afford  to  form  the  individual  entries  of  the  matrices. 


Remark  3.16  It  is  also  possible  to  estimate  the  spectral  norm  of  a  matrix  using  the  power 
method  (or  the  Lanczos  method  when  tighter  bounds  are  desired)  with  a  random  starting 
vector.  The  analysis  in  [11]  guarantees  a  better  bound  for  both  the  power  method  and  the 
Lanczos  method  as  compared  with  the  scheme  of  the  present  subsection  when  running-times 
are  proportional  to  operation  counts,  and  storage  is  not  an  issue.  However,  the  power  or 
Lanczos  methods  require  successive  applications  of  the  matrix  being  tested  (as  well  as  its 
transpose)  to  a  sequence  of  vectors  generated  on-the-fly,  whereas  the  scheme  of  the  present 
subsection  requires  only  the  application  of  the  matrix  being  tested  to  a  collection  of  indepen¬ 
dently  generated  vectors.  Thus,  in  some  circumstances  the  scheme  of  the  present  subsection 
parallelizes  better  than  the  power  and  Lanczos  methods. 
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4  Mathematical  apparatus 

In  this  section,  we  describe  the  principal  mathematical  tools  used  in  Section  5. 


4.1  Several  technical  lemmas 

In  this  subsection,  we  prove  several  lemmas  needed  for  the  proof  of  Lemma  4.4  in  Subsec¬ 
tion  4.2. 

The  following  lemma  evaluates  the  mean  and  variance  of  a  certain  random  variable  similar 
to  a  random  walk. 


Lemma  4.1  Suppose  that  l  and  m  are  positive  integers  with  l  <  m,  s\,  s2,  ■  ■  ■ ,  i  ,  si  are 

i.i.d.  integer  random  variables,  each  distributed  uniformly  over  {1,  2, _ ,  m  —  1,  m},  F  is  the 

m  x  m  unnormalized  discrete  Fourier  transform,  Gsqb  is  the  complex  number 

Gsqb  =  Fsq  Fsb  (43) 


for  s,  q,b  =  1,  2,  ...,  m  —  1,  m,  and  Hqb  is  the  complex  number 


for  q,  b  =  1,  2,  . . . ;  m  —  1,  m. 
Then, 

for  q  =  1,2,  . . . ,  m  —  1,  m, 
when  q  b,  and 
when  q=fb. 


r=  1 


Hqq  l, 

E  Hqb  =  0 
E  \Hqb\2  =  l 


Proof.  First,  we  prove  (45).  It  follows  from  (43)  that 


G 


sqq  ~ 


1 


for  s,  q  —  1,  2,  . . . ,  m  —  1,  m.  Combining  (44)  and  (48)  yields  (45). 
Next,  we  prove  (46).  It  follows  from  (44)  that 


(44) 

(45) 

(46) 

(47) 

(48) 


i 

E  Hqb  =  ^  E  GSrqb  (49) 

r=l 

for  q,  b  —  1,  2,  . . . ,  m  —  1,  m.  For  any  r  =  1,  2,  . . . ,  l  —  1,  /,  it  follows  from  the  fact  that  sr 
is  distributed  uniformly  over  {1,  2, . . . ,  m  —  1,  m}  that 

^  m 

E  Gs  qb  =  -  V  Gsqb  (50) 

m  z — ' 
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for  q,  b  —  1,  2,  . . . ,  m  —  1,  m.  Combining  (43)  and  the  fact  that  distinct  columns  of  F  are 
orthogonal  yields  that 

m 

=  0  (51) 

8=1 

when  q  7^  b.  Combining  (49),  (50),  and  (51)  yields  (46). 

Finally,  we  prove  (47).  It  follows  from  (44)  that 


i 


l^l2  =  XlG»^l2  +  EG»^G^ 

r= 1  r^t 

(52) 

for  q,  b  =  1,  2,  ... , 

m  —  1,  m.  It  follows  from  (52)  that 

i 

E \Hqb\2  =  Y,  E \GSrqb\2  +  J2  E  GSrqb  Gstqb 

r=  1  r^t 

(53) 

for  q,  b  —  1,  2,  . . . ,  m  —  1,  m. 

It  follows  from  (43)  that 

Gxgfc  =  l-^sgl  \Fsb\ 

(54) 

for  s,q,b  —  1,2,.. 

. ,  m  —  1,  m.  However, 

\l?sq\  =  \  J?sb\  =  1 

(55) 

for  s,q,b  —  1,2,.. 

. ,  m  —  1,  m.  Combining  (54)  and  (55)  yields  that 

E  \GSrqb\2  =  1 

(56) 

for  r  =  1,  2,  ...,/  —  1,  l  and  q,b  —  1,2,  . . . ,  m  —  1,  m. 

It  follows  from  the  independence  of  si,  s2,  •  •  • ,  S/_i,  Si  that 

E  GSrqb  GStqb  —  (E  GSr,qb)  (E  GStqb) 

(57) 

for  q,  b  =  1,  2,  ... , 

m  —  1,  m ,  when  r  ^  t.  Combining  (57),  (50),  and  (51)  yields  that 

E  GSrqb  GStqb  =  0 

(58) 

for  q,  b  —  1,  2,  . . . ,  m  —  1,  m,  when  r  ^  t. 

Combining  (53),  (56),  and  (58)  yields  (47). 

□ 

We  will  need  the  following  technical  lemma  in  order  to  prove  Lemma  4.4  below. 

Lemma  4.2  Suppose  that  k,  l,  and  m  are  positive  integers,  such  that  k  <  l  <  m.  Suppose 
further  that  Hqb  is  the  complex  number  defined  in  (44)  and  (4$)>  for  b  =  1,2,  . . . ,  m—1,  m. 
Suppose  finally  that  1Z  is  the  l  x  m  SRFT  defined  in  Section  2,  U  is  a  complex  m  x  k  matrix 
whose  columns  are  orthonormal,  C  is  the  complex  k  x  k  matrix  defined  via  the  formula 

C  =  (77  C)*  (77 C),  (59) 
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and  E  is  the  complex  k  x  k  matrix  with  the  entry 


EPa  =  ^2  dq  Uip  ^2  db  Uba  Hqb  (6°) 

9=1  &#9 

for  p,a  =  1,  2,  . . . ,  k  —  1,  k,  where  d\,  d2,  ■  ■■,  dm_  i,  dm  are  the  i.i.d.  complex  random 
variables,  each  distributed  uniformly  over  the  unit  circle,  used  in  the  construction  of  D  in  (6) 
for  the  SRFT  K. 

Then, 

C  =  F1  +  E.  (61) 


Proof.  For  any  integers  a  and  p  with  1  <  a  <  k  and  1  <  p  <  k,  combining  (59)  and  (6) 
yields  that 

l  m  m 

Cpa  =  y,  y,  FSrg  dg  Uqp  FSrb  db  Uba.  (62) 

r=  1  q=l  6=1 

Combining  (62),  (43),  and  (44)  yields  that 


m  m 

EPa  =  'y  "j  M<?l 2  UqpUqa  Hqg  +  ^  "  dq  Uqp  ^  "  db  Uba  Hqb .  (63) 

9=1  9=1  b^q 


Combining  (63),  (45),  the  fact  that  \dq\  =  1,  and  the  fact  that  the  columns  of  U  are 
orthonormal  yields  that 

m 

Cpp  =  l  +  'y  ^  dq  Uqp  y>  ^  db  Ubp  Hqb,  (64) 

9=1  b^q 


and 


a 


pa 


m 

^2  dq  Uqp  E  db  Uba  Hqb 

9=1  b^q 


when  p  ^  a. 

Combining  (64),  (65),  and  (60)  yields  (61). 


(65) 

□ 


The  following  lemma  states  that  the  spectral  norm  of  the  matrix  E  defined  in  (60)  is 
reasonably  small  with  high  probability. 


Lemma  4.3  Suppose  that  a  and  (3  are  real  numbers  greater  than  1,  and  k,  l,  and  m  are 
positive  integers,  such  that 

a2  3 

m>  l>  - -r -  k2.  (66) 

(a  —  l)z 

Suppose  further  that  1Z  is  the  l  x  m  SRFT  defined  in  Section  2,  U  is  a  complex  mx  k  matrix 
whose  columns  are  orthonormal,  and  E  is  the  complex  kxk  matrix  defined  in  (60),  with  Hqb 
being  the  complex  number  defined  in  (44)  and  (43),  and  with  d\,  d2,  . . . ,  dm- 1,  dm  being  the 
i.i.d.  complex  random  variables,  each  distributed  uniformly  over  the  unit  circle,  used  in  the 
construction  of  D  in  (6)  for  the  SRFT  7Z. 
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Then, 

ll£ll  -  '  “  o)  (67) 

with  probability  at  least  1  — 

Proof.  We  first  derive  an  upper  bound  on  E  \Epa\2,  for  p,  a  —  1,  2,  . . . ,  k  —  1,  k,  and  then 
use  this  bound  to  prove  (67).  It  follows  from  (60)  that 


E\Epa\2  =  E 


ca  Hrc 


(68) 


Performing  the  summation  over  q  and  r  separately  for  the  cases  when  q  =  r  and  when  q  ^  r, 
and  using  the  fact  that  \dq\  =  1,  we  obtain  that 


e£  d q  Uqp  E  db  Uba  Hqb  J  (  dr  Urp  E  dc  Uca  Hr 


q,r=  1 


b^q 


c^r 


E  E  l^pl 

5=1 


^  ^  db  Uba  Hqb 
b^q 


eE  dq  dr  Uqp  Urp  E  db  Uba  Hqb  ^  ^  dc  Uca  Hrc.  (69) 

q^r  b^q  c^r 


To  bound  the  first  term  in  the  right-hand  side  of  (69),  we  observe  that 


e  E  i^pi 

5=1 


^  ^  db  Uha  Hqb 

b^q 


Ei^i2e 

5=1 


^  db  Uba  Hqb 
b^q 


(70) 


But, 


eE*  Uba  Hqb  —  E  db  Uba  Hqb  E  dc  U ca  HqC- 

b^q  b^q  c^q 

Moreover, 

eE  db  Uba  Hqb  E  dc  U ca  HqC  E  Uba  Uca  E  db  dc  Hqb  Hqc. 

b^q  c^q  b,c^q 

Performing  the  summation  over  b  and  c  separately  for  the  cases  when  b  =  c  and  when  b  ^  c, 
and  using  the  fact  that  db  =  1,  we  obtain  that 


(71) 

(72) 


E]  Uba  Uca  E  db  dc  Hqb  Hqc  —  E^  |0>a|  E\Hqb\'2+  E]  Uba  UcaE  db  dc  Hqb  Hqc.  (73) 

b,c^q  b^q  b,c^q  and  b^c 


It  follows  from  (47)  that 

El^al'El^l^/El^l2- 

b^q  b^q 

It  follows  from  the  fact  that  the  columns  of  U  are  normalized  that 

n 

E  I Uba?  <  E  l^a|2  =  I- 

b^q  b=  1 


(74) 


(75) 
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Combining  (74)  and  (75)  yields  that 

J2\Uba\2V\Hqb\2  <1. 

b^q 

It  follows  from  the  independence  of  the  random  variables  involved  that 

y,  uba  Uca  E  db  dc  Hqb  Hqc  =  y  Uba  Uca  (E  db)  (E  dc)  ( E  Hqb  Hqc) . 

b,c^q  and  b^c  b,c^q  and  b^c 

Combining  (77)  and  the  fact  that  E  db  —  0  (or  E  dc  =  0)  yields  that 


^  ^  Uba  Uca  E  db  dc  Hqb  Hqc  0. 

b,c^q  and  b^c 

Combining  (70),  (71),  (72),  (73),  (76),  and  (78)  yields  that 


E  £  \U„\ 

<7=1 


y  ^  db  Uba  Hqb 

b^q 


< 


l  Wqp\ 
<7=1 


Combining  (79)  and  the  fact  that  the  columns  of  U  are  normalized  yields  that 


Eyitvi 

<7=1 


'y  ^  db  Uba  Hqb 

bH 


<  l. 


To  bound  the  second  term  in  the  right-hand  side  of  (69),  we  observe  that 


(76) 

(77) 

(78) 

(79) 

(80) 


B£  dq  dr  U qp  Urp  £  db  Uba  ddqb  £  dc  H ca  Hr 


q+r 


b+q 


c^r 


b£  dq  dr  Uqp  Urp  I  dr  Ura  Hqr  “t-  £  db  Uba  Hqb  J  I  dq  Uqa  Hrq  -\-  y  ^  dc  Uca  Hrc  J  .  (81) 

q^r  \  b^q,r  /  \  c^q,r  / 


Expanding  the  product,  we  obtain  that 


b£  dq  dr  Uqp  Urp  I  dr  Ura  -Uqr  H-  £  db  Uba  Hqb  J  I  dq  Uqa  Hrq  +  £  dc  Uca  Hrc  j 
g/r  \  b^q,r  /  \  c^q,r  / 


E  ^  ^  dq  dr  Uqp  Urp  dr  dq  Ura  Uqa  Hqr  Hrq 


q^r 


e£  dq  dr  Uqp  U rp  £  db  Uba  Hqb  £  dc  U ca  Hr 

q^r  b^q,r  07^,7* 

+e£  dq  dr  Uqp  U rp  dr  Ura  Hqr  ^  ^  dc  Uca  Hrc 


q^Lr 


Cy £q,r 


B  £  dq  dr  Uqp  Urp  dq  Uqa  Hrq  £  dbUhaHqb.  (82) 

q^r  b^q,r 
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To  evaluate  the  first  term  in  the  right-hand  side  of  (82),  we  observe  that 
E  ^  ^  dq  dr  Uqp  Urp  dr  dq  Ura  Uqa  Hqr  Hrq  ^  ^  Uq p  Urp  Ura  U qa  E  ( dq )  (dr ) “  Hqr  Hrq ■  (83) 

q^r  q^r 

It  follows  from  the  independence  of  the  random  variables  involved  that 

E  (dq)2Jdr)*H^Hrq  =  (E  (dq)2)  (E Jdr?)  (E  Hrq)  (84) 

when  q  ^  r.  Combining  (83),  (84),  and  the  fact  that  E  ( dq )2  =  0  (or  E  ( dr )2  =  0)  yields  that 

E£  dq  dr  Uqp  Urp  dr  dq  Ura  Uqa  Hqr  Hrq  —  0.  (85) 

q^r 

To  evaluate  the  second  term  in  the  right-hand  side  of  (82),  we  note  that  the  independence 
of  the  random  variables  involved  implies  that 

E£  dq  dr  Uqp  Urp  £  db  Uba  Hqb  £  dc  U ca  -Urc 

q^r  by ^q,r  c^q,r 

=  J]£V£MEcy  (e^)  (e  Y  db  Uba  Hqb  £  dcUcaHr}\  .  (86) 


Combining  (86)  and  the  fact  that  E  dq  =  0  (or  E  dr  =  0)  yields  that 

e£  dq  dr  Uqp  Urp  £  db  Uba  Hqb  ^  ^  dc  Uca  Hrc  0*  (87) 

q^r  by £q,r  c^q,r 

To  evaluate  the  third  term  in  the  right-hand  side  of  (82),  we  observe  that 
E  ^  ^  dq  dr  Uqp  Urp  dr  Ura  Hqr  ^  ^  dc  Uca  Hrc  ^  ^  Uqp  Urp  Ura  E  dq  (dr)^  Hqr  ^  ^  dc  Uca  Hrc- 

q^r  c^q,r  q^r  c^q,r 

(88) 

It  follows  from  the  independence  of  the  random  variables  involved  that 


E  dq(dr)2  Hqr  E  **  u ca  Hrc  =  (E  dq)  (E  (dr?)  E  Hqr  ^  dc  Uca  Ht 


when  q  ^  r.  It  follows  from  the  fact  that  E  dq  =  0  (or  E  ( dr )2  =  0)  that 


(E  dq)  (E  (dr)2)  (  E  HqrY  dc  Uca  HrC  J  =  0. 

\  c^q,r  / 


Combining  (88),  (89),  and  (90)  yields  that 


E  ^  ^  dq  dr  Uqp  Urp  dr  Ura  Hqr  ^  ^  dc  U ca  Hrc  0. 
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Similarly, 


(92) 


E£  dq  dr  U qp  U rp  dq  U qa  Hrq  £  db  Uba  ddqb  0* 
q^r  bj^q,r 

Combining  (81),  (82),  (85),  (87),  (91),  and  (92)  yields  that 

e£  dq  dr  U qp  Urp  £  db  Uba  H qb  £  dc  U ca  Hrc  9. 
q^r  b^q  c^r 

Combining  (68),  (69),  (80),  and  (93)  yields  that 

E  \Epa\2  <  1 

It  follows  from  (94)  that 

k 

E  \Epa\2<k2l. 

p,a= 1 

However, 

k 

ll-EII2  <  £  l^l2. 

p,a=l 

Combining  (96)  and  (95)  yields  that 

E  || E\\2  <  k2l. 

Combining  (97)  and  the  Markov  inequality  yields  that 

l|S||  <  vWi 

with  probability  at  least  1  —  4.  Combining  (98)  and  (66)  yields  (67). 


(93) 

(94) 

(95) 

(96) 

(97) 

(98) 


4.2  Spectral  norms  of  various  random  matrices 

In  this  subsection,  we  derive  bounds  on  the  spectral  norms  of  several  random  matrices. 
With  the  choice  a  =  8  and  (3  —  2,  the  following  lemma  states  that,  with  probability  at 

least  1,  the  least  singular  value  of  the  complex  l  x  k  matrix  7 ZU  is  at  least  and  the 

greatest  singular  value  is  at  most  where  7 Z  is  the  l  x  m  SRFT  defined  in  Section  2,  U 

is  a  complex  m  x  k  matrix  whose  columns  are  orthonormal,  and  m  >  l  >  3k2.  This  lemma 
is  similar  to  the  subspace  Johnson- Lindenstrauss  lemma  (Corollary  11)  of  [16]. 


Lemma  4.4  Suppose  that  a  and  (3  are  real  numbers  greater  than  l,  and  k,  l,  and  m  are 
positive  integers,  such  that 


m  >  l  > 


a 2  (3 
(a-  l)2 


k2. 


(99) 
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Suppose  further  that  TZ  is  the  l  x  m  SRFT  defined  in  Section  2,  U  is  a  complex  mx  k  matrix 
whose  columns  are  orthonormal,  and  C  is  the  complex  k  x  k  matrix  defined  via  the  formula 


c  =  (ku)*  c tzu ). 

Then,  the  least  (that  is,  the  kth  greatest)  singular  value  of  TZU  satisfies 


and  (simultaneously)  the  greatest  singular  value  o\  of  TZU  satisfies 

with  probability  at  least  1  — 


(100) 

(101) 

(102) 


Lemma  4.5  Suppose  that  l  and  m  are  positive  integers  with  l  <  m,  and  7 Z  is  the  l  x  m 
SRFT  defined  in  Section  2. 


Then, 


\\TZ\\  <  Vim. 

(106) 

Proof.  It  follows  from  (6)  that 

m<\\S\\  ||F||  \\D\\. 

(107) 

However, 

I|S||  <  Vi, 

(108) 
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\\F\\  <  Vm, 

(109) 

and 

IPII  =  1- 

(110) 

Combining  (107),  (108),  (109),  and  (110)  yields  (106). 

□ 

The  following  lemma  states  that,  for  any  matrix  A,  with  high  probability  there  exists  a 
matrix  T  with  a  reasonably  small  spectral  norm,  such  that  T1ZA  is  a  good  approximation 
to  A,  where  7Z  is  the  SRFT  defined  in  Section  2. 

Lemma  4.6  Suppose  that  k,  l,  m,  and  n  are  positive  integers  with  k  <  l,  such  that  l  <  m 
and  l  <  n.  Suppose  further  that  a  and  /3  are  real  numbers  greater  than  1,  such  that 

m>  l>  13  k2.  (Ill) 

(a  -  l)2 

Suppose  finally  that  A  is  a  complex  m  x  n  matrix,  and  1Z  is  the  l  x  m  SRFT  defined  in 
Section  2. 

Then,  there  exists  a  complex  m  x  l  matrix  T  such  that 

\\T1ZA  —  7L||  <  \J am  +  1  oy.+i  (112) 

and 

imi  <  (in) 

with  probability  at  least  1  —  jj,  where  <Jk+i  is  the  (k  +  l)s<  greatest  singular  value  of  A. 

Proof.  We  prove  the  existence  of  a  matrix  T  satisfying  (112)  and  (113)  by  constructing  one. 
We  start  by  forming  an  SVD  of  A, 

A  =  UTj  V*,  (114) 

where  U  is  a  complex  unitary  m  x  m  matrix,  E  is  a  real  m  x  n  matrix  whose  entries  are 
nonnegative  everywhere  and  zero  off  of  the  main  diagonal,  and  V  is  a  complex  unitary  n  x  n 
matrix,  such  that 

=  CTi  (115) 

for  %  —  1,  2,  . . . ,  min{m,  n\  —  1,  min{m,  n},  where  E ^  is  the  entry  in  row  i  and  column  i  of 
E,  and  a,  is  the  ith  greatest  singular  value  of  A. 

Next,  we  define  auxiliary  matrices  G  and  H .  We  define  G  to  be  the  leftmost  l  x  k  block 
of  the  l  x  m  matrix  1Z  U,  and  FI  to  be  the  rightmost  l  x  (m  —  k )  block  of  1Z  U,  so  that 

TIU=(G\H).  (116) 

We  define  G('_1')  to  be  the  complex  k  x  /  matrix  given  by  the  formula 

G(-T  =  (G*  G)-1  G* .  (117) 
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Finally,  we  define  T  to  be  the  m  x  /  matrix  given  by 


T  —  U 


0 


(118) 


Combining  (12),  (117),  (116),  and  (101)  (using  the  leftmost  k  columns  of  the  matrix  U 
from  the  present  proof  as  the  matrix  denoted  U  in  (101)  and  (100))  yields  that 

(119) 


E  = 

Combining  (114),  (116),  and  (118)  yields  that 

G 

TTZA  -  A  =  U  '  ' 


$ 

0 

0 

It^ll  *  fj 

with  probability  at  least  1  —  7.  Combining  (118),  (119),  and  the  fact  that  U  is  unitary 
yields  (113). 

We  now  show  that  T  defined  in  (118)  satisfies  (112). 

We  define  to  be  the  leftmost  uppermost  k  x  k  block  of  E,  and  \F  to  be  the  rightmost 
lowermost  (m  —  k)  x  (n  —  k)  block  of  E,  so  that 

r»  \ 

(120) 

(121) 

(122) 

(123) 

(124) 

(125) 


0 


Combining  (117)  and  (120)  yields  that 

G<- 0 


0 


(Glff)-1  E=U 


H 

)-lj  E  V' 

0 

G(_1)  H  vF 

0 

-vF 

Furthermore, 

Moreover, 

Combining  (120)  and  (115)  yields  that 


0 

G(-b  H  vF  N 

0 

||g(_1)  h  \f| 

<  G(_1)  H  \F  2 +  11^1 


ll^ll  <  crk+ 1. 

Combining  (121),  (122),  (123),  (124),  (125),  and  the  fact  that  U  and  V  are  unitary  yields 
that 

\\THA-A\\  <  \J ||G(_1)||2  \\H\\*  +  l<rk+1.  (126) 

Clearly, 

l|ff||<||(G[H  )||.  (127) 

Combining  (116)  and  the  fact  that  U  is  unitary  yields  that 

||(G|/r)||  =  ||R||.  (128) 

Combining  (127),  (128),  and  (106)  yields  that 

||i7||  <  Vim.  (129) 

Combining  (126),  (119),  and  (129)  yields  (112).  □ 
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4.3  Randomized  linear  least-squares  regression 

In  this  subsection,  we  derive  bounds  regarding  randomized  methods  for  the  solution  in  the 
least-squares  sense  of  overdetermined  systems  of  linear-algebraic  equations. 

The  following  lemma  states  that,  with  high  probability,  there  exists  a  matrix  0  with 
a  reasonably  small  spectral  norm,  such  that  0  is  the  inverse  of  the  SRFT  77  (defined  in 
Section  2)  on  the  image  under  77  of  a  certain  subspace. 

Lemma  4.7  Suppose  that  a  and  (3  are  real  numbers  greater  than  1,  and  k,  l,  and  m  are 
positive  integers,  such  that 

m>  l>  (2k)2.  (130) 

(a  —  l)z 

Suppose  further  that  77  is  the  l  x  m  SRFT  defined  in  Section  2,  A  is  a  complex  mxk  matrix, 
and  B  is  a  complex  mxk  matrix. 

Then,  there  exists  a  complex  m  x  l  matrix  0  such  that 


0774  =  A, 

(131) 

0775  =  B, 

(132) 

and 

m<^ 

(133) 

with  probability  at  least  1  — 

Proof.  We  define  U  to  be  a  complex  matrix  whose  columns  constitute  an  orthonormal  basis 
of  the  subspace  of  Cm  spanned  by  the  columns  of  A  and  the  columns  of  B.  We  define  j  to 
be  the  number  of  columns  in  U .  Combining  the  facts  that  A  has  k  columns  and  that  B  has 
k  columns  yields  that 

j  <  2k.  (134) 

Combining  (130),  (134),  (101),  and  the  fact  that  the  columns  of  U  are 
that  the  least  (that  is,  the  jth  greatest)  singular  value  a3  of  77  U  satisfies 

orthonormal  yields 

v? 

IV 

si^l 

(135) 

with  probability  at  least  1  —  It  follows  from  (135)  that  (77 U)*  (' TZU ) 
can  define  0  to  be  the  complex  m  x  l  matrix 

is  invertible,  so  we 

0  =  U((TZU)*  (1ZU))-1  (TZU)*. 

(136) 

It  follows  from  (136)  that 

0771/  =  U. 

(137) 

Combining  (137)  and  the  fact  that  the  columns  of  A  and  the  columns  of  B  are  in  the  column 
space  of  U  yields  (131)  and  (132). 
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Combining  (136)  and  the  fact  that  the  columns  of  U  are  orthonormal  yields  that 

ll©||  <  \\({nuy  {nu)Yl  (izuy\\.  (138) 

Combining  (12)  and  (135)  yields  that 

\\{{yzuy  (yzu))~l  {izuy\\<  (139) 

Combining  (138)  and  (139)  yields  (133).  □ 

The  following  lemma  states  that,  with  high  probability,  a  k  x  k  matrix  X  minimizing 
\\1Z  A  X  —  1Z  B\\  also  minimizes  \\AX  —  B\\  to  within  a  small  factor,  where  1Z  is  the  l  x  m 
SRFT  defined  in  Section  2,  A  is  an  m  x  k  matrix,  and  B  is  an  m  x  k  matrix.  Whereas 

solving  A  X  B  in  the  least-squares  sense  involves  m  simultaneous  linear  equations,  solving 

1ZAX  ^7 ZB  in  the  least-squares  sense  involves  just  l  simultaneous  linear  equations. 

Lemma  4.8  Suppose  that  a  and  f3  are  real  numbers  greater  than  1,  and  k,  l,  and  m  are 
positive  integers,  such  that 

m  >  l  >  <2*=>2'  (WO) 

(a  —  l)z 

Suppose  further  that  7 Z  is  the  l  x  m  SRFT  defined  in  Section  2,  A  is  a  complex  mxk  matrix, 
B  is  a  complex  mxk  matrix,  X  is  a  complex  k  x  k  matrix  which  minimizes  the  quantity 

\\KAX  -KB\\t  (141) 

and  Y  is  a  complex  k  x  k  matrix  which  minimizes  the  quantity 

\\AY-B\\.  (142) 

Then, 

\\AX  -  B ||  <  V2a  -  1  \\AY-  B\\  (143) 

with  probability  at  least  1  — 

Proof.  Combining  (131)  and  (132)  yields  that 

\\AX  -  B\\  =  \\&TZAX  -  QTZB\\.  (144) 

However, 

\\QTZAX  —  QTZB\\  <  ||0||  \\TZAX  -  KB\\.  (145) 

It  follows  from  the  fact  that  X  minimizes  (141)  that 

\\TZAX  —  TZB\\  <  \\1ZAY  —  1ZB\\.  (146) 

We  next  define  U  to  be  a  matrix  whose  columns  constitute  an  orthonormal  basis  for  the 
subspace  of  Cm  spanned  by  the  columns  of  A  and  the  columns  of  B ,  and  define  j  to  be  the 
number  of  columns  in  U .  Then,  there  exists  a  complex  j  x  k  matrix  Z  such  that 

AY  —  U  Z,  (147) 
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and  there  exists  a  complex  j  x  k  matrix  C  such  that 

B  =  UC.  (148) 

Combining  (147)  and  (148)  yields  that 

\\nAY-KB\\  =  \\nuz-nuc\\.  (149) 

Yet, 

\\nuz-nuc\\<\\nu\\\\z-c\\.  (150) 

Combining  the  facts  that  A  has  k  columns  and  that  B  has  k  columns  yields  that 

j  <  2 k.  (151) 

Combining  (140),  (151),  (102),  and  the  fact  that  the  columns  of  U  are  orthonormal  yields 
that 

||Kf/||<hi(2~i)  (152) 

with  probability  at  least  1  —  \- 

It  follows  from  the  fact  that  the  columns  of  U  are  orthonormal  that 

\\Z-C\\  =  \\UZ-  UC\\.  (153) 

Combining  (147)  and  (148)  yields  that 

\\UZ-UC\\  =  \\AY-B\\.  (154) 

Combining  (153)  and  (154)  yields  that 

\\Z-C\\  =  \\AY-B\\.  (155) 

Combining  (144),  (145),  (146),  (149),  (150),  (152),  (155),  (133),  and  the  fact  that  the 
matrix  U  used  in  the  proof  of  (133)  is  identical  to  the  matrix  U  used  in  the  present  proof 
(so  that  (133)  and  (152)  hold  simultaneously  with  probability  at  least  1  —  |)  yields  (143).  □ 

Remark  4.9  Theorem  12  in  [16]  motivated  us  to  use  Lemma  4.8.  Lemma  4.8  and  its  proof 
are  modelled  after  Theorem  12  in  [16]. 

The  following  lemma  states  that,  with  high  probability,  the  product  P  X  Q*  of  matrices 
P,  X,  and  Q*  is  a  good  approximation  to  a  matrix  A,  provided  that 

1.  A*  P  P*  is  a  good  approximation  to  A*, 

2.  AQQ*  is  a  good  approximation  to  A, 

3.  the  columns  of  Q  are  orthonormal,  and 
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4.  X  minimizes  \\7ZP  X  —  TZAQ\\,  where  1Z  is  the  SRFT  defined  in  Section  2. 

Lemma  4.10  Suppose  that  a  and  j3  are  real  numbers  greater  than  1,  and  k,  l,  m,  and  n  are 
positive  integers,  such  that 

m>  l>  a  ^  (2/c)2.  (156) 

{a-iy 

Suppose  further  that  TZ  is  the  l  x  m  SRFT  defined  in  Section  2,  A  is  a  complex  m  x  n  matrix, 
P  is  a  complex  m  x  k  matrix,  Q  is  a  complex  n  x  k  matrix  whose  columns  are  orthonormal, 
and  X  is  a  complex  k  x  k  matrix  which  minimizes  the  quantity 

\\1ZP  X  -1ZAQ\\.  (157) 

Then, 

\\PXQ*  -A\\  <  sj2a  -  1  \\A*PP*  -  A*\\  +  \\AQQ*  -  A\\  (158) 

with  probability  at  least  1  — 

Proof.  It  follows  from  the  triangle  inequality  that 

\\PXQ*  -  A\\  <  \\P X  Q*  —  AQQ*\\  +  \\AQQ*-A\\.  (159) 

To  derive  a  bound  on  the  first  term  in  the  right-hand  side  of  (159),  we  observe  that 

\\PXQ* -AQQ*\\  <  \\PX-AQ\\  \\Q\\.  (160) 

Combining  (156),  (143),  and  the  fact  that  X  minimizes  (157)  yields  that 

\\PX-AQ\\  <  V2a-l  \\P  (P*  AQ)  —  AQ\\  (161) 

with  probability  at  least  1  —  However, 

\\PP*  AQ  -  AQ\\  <  \\A*PP*  -A*\\  ||g||.  (162) 

It  follows  from  the  fact  that  the  columns  of  Q  are  orthonormal  that 

non  <  i.  ties) 

Combining  (160),  (161),  (162),  and  (163)  yields  that 

\\PXQ*  -AQQ*\\  <  V2a-  1  \\A*  P P*  -  A*\\  (164) 

with  probability  at  least  1  — 

Combining  (159)  and  (164)  yields  (158).  □ 


5  Description  of  the  algorithm 

In  this  section,  we  describe  the  algorithm  of  the  present  paper.  In  Subsection  5.1,  we  discuss 
approximations  to  interpolative  decompositions.  In  Subsection  5.2,  we  discuss  approxima¬ 
tions  to  SVDs.  In  Subsection  5.3,  we  discuss  a  usually  more  efficient  alternative  method 
for  constructing  approximations  to  SVDs.  In  Subsection  5.4,  we  tabulate  the  computational 
costs  of  various  parts  of  the  algorithm. 
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5.1  Interpolative  decomposition 

Suppose  that  k,  m,  and  n  are  positive  integers  with  k  <  m  and  k  <  n,  and  A  is  a  complex 
m  x  n  matrix.  In  this  subsection,  we  will  collect  together  k  appropriately  chosen  columns  of 
A  into  a  complex  m  x  k  matrix  B,  and  construct  a  complex  k  x  n  matrix  P,  such  that 


||P||  <  ^4 k  (n  -  k)  +  1 

(165) 

BP  -  A\\  <  Vkmn  ak+1 , 

(166) 

where  (Jk+i  is  the  (k  +  l)st  greatest  singular  value  of  A.  We  may  assume  without  loss  of 
generality  that  m  is  the  product  of  prime  factors  no  greater  than  a  small  constant  (say  2), 
if  necessary  by  adjoining  to  A  rows  consisting  entirely  of  zeros.  Then,  to  construct  matrices 
B  and  P  satisfying  (165)  and  (166),  we  choose  an  integer  l  near  to,  but  greater  than  k ,  such 
that  /  <  m  and  l  <  n  (for  example,  l  —  k  +  8),  and  make  the  following  three  steps: 

1.  Using  the  algorithm  of  Subsection  3.3,  compute  the  l  x  n  product  matrix 

Y  =  1ZA,  (167) 

where  1Z  is  the  /  x  m  SRFT  defined  in  Section  2.  (This  step  amounts  to  applying  A* 
to  7 Z*.  in  order  to  identify  the  range  of  A*.) 

2.  Using  Observation  3.3,  form  a  complex  l  x  k  matrix  Z  whose  columns  constitute  a 
subset  of  the  columns  of  Y,  and  a  complex  k  x  n  matrix  P  satisfying  (165),  such  that 

|| ZP  -  Y ||  <  y/4k(n-k)  +  l  rjk+i,  (168) 

where  r)k+\  is  the  (k  +  l)st  greatest  singular  value  of  Y. 

3.  Using  the  fact  that  the  columns  of  Z  constitute  a  subset  of  the  columns  of  Y .  for 
j  —  1,  2,  . . . ,  k  —  1,  k,  let  ij  denote  an  integer  such  that  the  jth  column  of  Z  is  the  i/h 
column  of  Y .  Form  the  complex  m  x  k  matrix  B  whose  jth  column  is  the  ijth  column 
of  A  for  j  —  1,  2,, . , . ,  k  —  1,  k. 

It  is  easy  to  see  that  the  matrices  B  and  P  satisfy  (165)  and  (166).  Indeed,  Step  2  above 
guarantees  (165)  by  construction.  Moreover,  combining  (167),  (168),  and  Remark  3.8  yields 
that 

\\1ZB  P  —  7ZA\\  <  \J Ak  (n  -  k)  +  1  r)k+i ,  (169) 

where  77^+1  is  the  (k  +  l)st  greatest  singular  value  of  Y.  Combining  (167)  and  the  fact  that 
77fc_|_i  is  the  ( k  +  l)st  greatest  singular  value  of  Y  yields  that 

Vk+i  <  \\TZ\\  cffc+i,  (170) 

where  (Jk+i  is  the  ( k  +  l)st  greatest  singular  value  of  A.  Suppose  that  a  and  f3  are  real 
numbers  greater  than  1,  such  that 

m>  l>  a  13  k2.  (171) 

( a  —  1)^ 
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Then,  combining  (13),  (112),  (113),  (165),  (169),  (170),  and  (106)  yields  that 


\\B  P  —  H||  <  Ak  (n  —  k)  +  1  +  1)  \/ am  +  1  +  \J Ak  (n  —  k)  +  1  y/am'j  oy.+i  (172) 

with  probability  at  least  1  —  1,  where  (Jk+i  is  the  ( k  +  l)st  greatest  singular  value  of  A.  The 
bound  (172)  is  a  precise  version  of  (166).  We  can  use  the  verification  scheme  described  in 
Subsection  3.4  to  estimate  \\B  P  —  kL||  during  each  run  of  the  algorithm. 

Strictly  speaking,  we  require  that  (171)  hold  in  order  to  prove  our  theoretical  bound  (172). 
However,  numerical  experiments  (some  of  which  are  reported  in  Section  6)  indicate  that  in 
fact  l  need  be  only  very  slightly  greater  than  k\  l  =  k  +  8  always  worked  in  our  experiments. 

5.2  Singular  value  decomposition 

Suppose  that  k,  m,  and  n  are  positive  integers  with  k  <  m  and  k  <  n,  such  that  m  and  n  are 
products  of  prime  factors  no  greater  than  a  small  constant  (2,  for  example).  Suppose  further 
that  A  is  a  complex  m  x  n  matrix.  In  this  subsection,  we  will  construct  an  approximation 
to  an  SVD  of  A  such  that 


\\U  E  V*  -  A\\  <  a/ max{m,  n}  ak+l,  (173) 

where  U  is  a  complex  m  x  k  matrix  whose  columns  are  orthonormal,  V  is  a  complex  n  x  k 
matrix  whose  columns  are  orthonormal,  E  is  a  real  diagonal  k  x  k  matrix  whose  entries  are 
all  nonnegative,  and  (Tk+i  is  the  ( k  +  l)st  greatest  singular  value  of  A.  To  do  so,  we  choose 
an  integer  l  near  to,  but  greater  than  k ,  such  that  l  <  m  and  /  <  n,  and  make  the  following 
ten  steps: 

1.  Using  the  algorithm  of  Subsection  3.3,  compute  the  l  x  n  product  matrix 

Y  =  11A,  (174) 

where  1Z  is  the  l  x  m  SRFT  defined  in  Section  2.  (This  step  amounts  to  applying  A* 
to  77*,  in  order  to  identify  the  range  of  A*.) 

2.  Using  the  algorithm  of  Subsection  3.3,  compute  the  l  x  m  product  matrix 

Y  =  KA*,  (175) 

where  77  is  the  /  x  n  SRFT  defined  in  Section  2.  (This  step  amounts  to  applying  A  to 
77*,  in  order  to  identify  the  range  of  A.) 

3.  Using  an  SVD,  form  a  complex  n  x  k  matrix  Q  whose  columns  are  orthonormal,  such 
that  there  exists  a  complex  k  x  l  matrix  Z  for  which 

WQZ  -  Y*\\  <  T)k+1,  (176) 

where  r]k+i  is  the  ( k  +  l)st  greatest  singular  value  of  Y ,  and  Y  is  defined  in  (174).  (See 
Observation  3.5  for  details  concerning  the  construction  of  such  a  matrix  Q.) 


4.  Using  an  SVD,  form  a  complex  m  x  k  matrix  P  whose  columns  are  orthonormal,  such 
that  there  exists  a  complex  k  x  l  matrix  Z  for  which 

\\PZ-Y*\\  <f)k+u  (177) 

where  fjk+i  is  the  ( k  +  l)st  greatest  singular  value  of  V,  and  Y  is  defined  in  (175).  (See 
Observation  3.5  for  details  concerning  the  construction  of  such  a  matrix  P .) 

5.  Using  the  algorithm  of  Subsection  3.3,  compute  the  l  x  k  product  matrix 

W  =  11P  (178) 

where  7 Z  is  the  same  realization  of  the  l  x  m  SRFT  as  in  (174),  and  P  is  from  (177). 

6.  Compute  the  l  x  k  product  matrix 


B  =  YQ  (179) 

where  Y  is  defined  in  (174),  and  Q  is  from  (176). 

7.  Compute  the  complex  k  x  k  matrix  X  which  minimizes  the  quantity 

||U' X  -  B\\,  (180) 

where  W  is  defined  in  (178),  and  B  is  defined  in  (179).  (See,  for  example,  Section  5.3 
in  [8]  for  details  concerning  the  construction  of  such  a  minimizing  X .) 

8.  Construct  an  SVD  of  X  from  (180),  that  is, 

X  =  UXY(VXY,  (181) 

where  Ux  is  a  complex  k  x  k  matrix  whose  columns  are  orthonormal,  Vx  is  a  complex 
k x k  matrix  whose  columns  are  orthonormal,  and  E  is  a  real  diagonal  kxk  matrix  whose 
entries  are  all  nonnegative.  (See,  for  example,  Chapter  8  in  [8]  for  details  concerning 
the  construction  of  such  an  SVD.) 

9.  Compute  the  m  x  k  product  matrix 

U  =  PUX,  (182) 

where  P  is  from  (177),  and  Ux  is  from  (181). 

10.  Compute  the  n  x  k  product  matrix 

V  =  QVX ,  (183) 

where  Q  is  from  (176),  and  Vx  is  from  (181). 
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It  is  easy  to  see  that  the  matrices  U,  £,  and  V  satisfy  (173).  Indeed,  combining  (180),  (178), 
(179),  and  (174)  yields  that  X  minimizes  the  quantity  (157).  Suppose  that  a  and  (3  are  real 
numbers  greater  than  1,  such  that 

m  >  l  >  (2 k)\  (184) 

Then,  combining  (184)  and  the  fact  that  X  minimizes  the  quantity  (157)  yields  (158),  that 

is,  _ 

\\P X  Q*  —  A\\  <  V2a  -  1  \\A*PP*  -  A*\\  +  \\AQQ*  -A\\  (185) 

with  probability  at  least  1  —  7. 

We  bound  \\AQQ*  —  A||  first,  then  \\A*  P  P*  —  A*||.  It  follows  from  (174)  that 

rjk+i  <  ||77||  ak+i,  (186) 

where  rjk+ 1  is  the  (k  +  l)st  greatest  singular  value  of  Y,  and  ak+ 1  is  the  ( k  +  l)st  greatest 
singular  value  of  A.  Combining  (18),  (112),  (113),  (176),  (174),  (186),  and  (106)  yields  that 

|| AQQ*  —  A\\  <2  Jam  +  1  +  a /arnj  ak+1  (187) 

with  probability  at  least  1  — 

Similarly, 

\\A*  P  P*W  A*\\<2  (y/an  +  1  +  y/anj  ak+1  (188) 

with  probability  at  least  1  — 

Combining  (185),  (187),  and  (188)  yields  that 

\\P  X  Q*  —  A\\  <  2  (V2a  -1  +  1)  max{m,  n}  +  1  +  \J a  max{m,  n}  j  ak+i  (189) 

with  probability  at  least  1  —  Combining  (189),  (181),  (182),  and  (183)  yields  that 

\\U  E  V*  —  7l||  <2  (v^a  —  1  +  1)  ^\/a;  max{m,  n}  +  1  +  \J a  max{m,  n}  j  crk+ 1  (190) 

with  probability  at  least  1  —  The  bound  (190)  is  a  precise  version  of  (173).  We  can  use 
the  verification  scheme  described  in  Subsection  3.4  to  estimate  ||I/£C*  —  7l||  during  each 
run  of  the  algorithm. 

Remark  5.1  Step  7  is  motivated  by  an  idea  from  [15],  [1],  [7]  of  using  the  SRFT  defined  in 
Section  2  for  the  purpose  of  computing  a  solution  in  the  least-squares  sense  to  an  overdeter¬ 
mined  system  of  linear-algebraic  equations. 

Remark  5.2  It  is  possible  to  replace  the  l  x  n  matrix  Y  defined  in  (174)  with  a  k  x  n  matrix, 
by  applying  the  algorithm  of  Subsection  5.1  to  YT  and  using  the  transpose  of  the  obtained 
n  x  k  matrix  in  place  of  Y .  Similarly,  it  is  possible  to  replace  the  l  x  m  matrix  Y  defined 
in  (175)  with  a  k  x  rri  matrix,  by  applying  the  algorithm  of  Subsection  5.1  to  YT  and  using 
the  transpose  of  the  obtained  m  x  k  matrix  in  place  of  Y .  Furthermore,  it  is  possible  to 
obtain  replacements  for  Y  and  Y  by  using  UQ  R ”  decompositions  or  SVDs  in  place  of  the 
interpolative  decompositions  employed  by  the  algorithm  of  Subsection  5.1. 
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5.3  Singular  value  decomposition  by  means  of  the  interpolative 
decomposition 

In  this  subsection,  we  provide  an  alternative  to  the  algorithm  of  Subsection  5.2  for  computing 
an  approximation  to  a  singular  value  decomposition.  This  alternative  is  usually  somewhat 
more  efficient. 

Suppose  that  k,  m,  and  n  are  positive  integers  with  k  <  m  and  k  <  n,  and  A  is  a  complex 
m  x  n  matrix.  We  will  compute  an  approximation  to  an  SVD  of  A  such  that 

\\UZV*  -  A\\  <  Vh^n  ak+u  (191) 

where  U  is  a  complex  m  x  k  matrix  whose  columns  are  orthonormal,  V  is  a  complex  n  x  k 
matrix  whose  columns  are  orthonormal,  E  is  a  real  diagonal  kxk  matrix  whose  entries  are 
all  nonnegative,  and  &k+i  is  the  {k  +  l)st  greatest  singular  value  of  A.  To  do  so,  we  choose  an 
integer  l  near  to,  but  greater  than  k,  such  that  /  <  m  and  l  <  n  (for  example,  l  =  /c  +  8),  and 
use  the  algorithm  of  Subsection  5.1  to  construct  the  matrices  B  and  P  in  (165)  and  (166). 
Then,  we  make  the  following  four  steps: 

1.  Construct  a  lower  triangular  complex  k  x  k  matrix  L,  and  a  complex  n  x  k  matrix  Q 
whose  columns  are  orthonormal,  such  that 

P  =  LQ* .  (192) 

(See  Remark  3.11  for  details  concerning  the  construction  of  such  matrices  L  and  Q.) 

2.  Compute  the  m  x  k  product  matrix 

C  =  BL.  (193) 

3.  Construct  an  SVD  of  C,  that  is, 

C  =  f/sr,  (194) 

where  U  is  a  complex  mxk  matrix  whose  columns  are  orthonormal,  E  is  a  real  diagonal 
kxk  matrix  whose  entries  are  all  nonnegative,  and  W  is  a  complex  kxk  matrix  whose 
columns  are  orthonormal.  (See,  for  example,  Chapter  8  in  [8]  for  details  concerning 
the  construction  of  such  an  SVD.) 

4.  Compute  the  n  x  k  product  matrix 


V  =  QW.  (195) 

It  is  clear  that  the  columns  of  U  are  orthonormal,  as  are  the  columns  of  V,  and  that  the 
entries  of  E  are  all  nonnegative  and  are  zero  off  of  the  main  diagonal.  It  is  easy  to  see  that 
the  matrices  V,  E,  and  V  satisfy  (191).  Indeed,  suppose  that  a  and  (3  are  real  numbers 
greater  than  1,  such  that 

m>  l  >  a  ^  k2.  (196) 
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Then,  combining  (172)  and  Lemma  3.10  yields  that 

\\U  £  V*  —  A  ||  <  Ak  (n  —  k)  +  1  +  1)  \J  ocm  +  1  +  \J Ak  Jn  —  k)  +  1  ^/arn'j  <Tk+i  (197) 

with  probability  at  least  1  —  where  (Jk+i  is  the  ( k  +  l)st  greatest  singular  value  of  A.  The 
bound  (197)  is  a  precise  version  of  (191).  We  can  use  the  verification  scheme  described  in 
Subsection  3.4  to  estimate  \\UTiV*  —  7L||  during  each  run  of  the  algorithm. 

Strictly  speaking,  we  require  that  (196)  hold  in  order  to  prove  our  theoretical  bound  (197). 
However,  numerical  experiments  (some  of  which  are  reported  in  Section  6)  indicate  that  in 
fact  l  need  be  only  very  slightly  greater  than  k\  l  =  k  +  8  always  worked  in  our  experiments. 

Remark  5.3  Steps  2  and  4  in  the  procedure  of  the  present  subsection  are  somewhat  subtle 
numerically.  Both  Steps  2  and  4  involve  constructing  products  of  matrices,  and  in  general 
constructing  the  product  5  12  of  matrices  H  and  12  can  be  numerically  unstable.  Indeed,  in 
general  some  entries  of  £  or  12  can  have  unmanageably  large  absolute  values,  while  in  exact 
arithmetic  no  entry  of  the  product  £12  has  an  unmanageably  large  absolute  value;  in  such 
circumstances,  constructing  the  product  £  12  can  be  unstable  in  finite-precision  arithmetic. 
However,  this  problem  does  not  arise  in  Steps  2  and  4  above,  due  to  (165),  (25),  the  fact 
that  the  columns  of  B  constitute  a  subset  of  the  columns  of  A  (so  that  ||R||  <  ||H||),  and 
the  fact  that  the  columns  of  Q  are  orthonormal,  as  are  the  columns  of  W. 

5.4  Costs 

In  this  subsection,  we  tabulate  the  numbers  of  floating-point  operations  required  by  the 
algorithm  described  in  Subsections  5.1,  5.2,  and  5.3,  as  applied  once  to  a  complex  m  x  n 
matrix  A. 

5.4.1  Interpolative  decomposition 

The  algorithm  of  Subsection  5.1  incurs  the  following  costs  in  order  to  compute  an  approxi¬ 
mation  to  an  interpolative  decomposition  of  A: 

1.  Computing  Y  in  (167)  costs  0(mn  log(/)). 

2.  Computing  Z  and  P  in  (168)  costs  Oilkn  log(n)). 

3.  Forming  B  in  Step  3  requires  retrieving  k  columns  of  the  m  x  n  matrix  A,  which  costs 
O(km). 

The  verification  scheme  of  Subsection  3.4  requires  applying  A,  B,  and  P  to  a  fixed  number 
(say  6)  of  vectors,  at  costs  of  0(mn),  O(km),  and  0{kn).  Summing  up  the  costs  for  Steps  1- 
3  above  and  for  the  verification  scheme,  we  conclude  that  the  algorithm  of  Subsection  5.1 
costs 

Cid  =  0(mn  log (/)  +  Ikn  log(n)).  (198) 

Remark  5.4  When  “Qi?”  decompositions  are  used  as  in  [5]  to  compute  the  matrices  Z  and 
P  in  (168),  the  cost  of  the  algorithm  of  Subsection  5.1  is  usually  less  than  the  cost  of  the 
algorithm  of  Subsection  5.2,  typically 

C'm  =  0(mn  log(/)  +  Ikn).  (199) 
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5.4.2  Singular  value  decomposition 

The  algorithm  of  Subsection  5.2  incurs  the  following  costs  in  order  to  compute  an  approxi¬ 
mation  to  a  singular  value  decomposition  of  A: 

1.  Computing  Y  in  (174)  costs  0(mn  log(/)). 

2.  Computing  Y  in  (175)  costs  0(mn  log(/)). 

3.  Computing  Q  in  (176)  costs  0{l2n). 

4.  Computing  P  in  (177)  costs  0(l2m). 

5.  Computing  W  in  (178)  costs  0(km  log(/)). 

6.  Computing  B  in  (179)  costs  0(lkn). 

7.  Computing  X  minimizing  (180)  costs  0(k2l). 

8.  Computing  the  SVD  (181)  of  X  costs  0(k3). 

9.  Computing  U  in  (182)  costs  0{k2m). 

10.  Computing  V  in  (183)  costs  0(k2  n). 

The  verification  scheme  of  Subsection  3.4  requires  applying  A,  U,  E,  and  V*  to  a  fixed 
number  (say  6)  of  vectors,  at  costs  of  0(mn),  O(km),  O(k),  and  0(kn).  Summing  up  the 
costs  for  Steps  1-10  above  and  for  the  verification  scheme,  we  conclude  that  the  algorithm 
of  Subsection  5.2  costs 

Csvd  =  0(mn  log(/)  +  l2(m  +  n )).  (200) 

Remark  5.5  With  the  modifications  described  in  Remark  5.2,  the  scheme  of  Subsection  5.2 
costs 

CgVD  =  0{mn  log(/)  +  k2(m  +  n )  +  kl2  log(Z)).  (201) 

(201)  can  be  less  than  (200)  when  l  is  large.  However,  while  our  current  theoretical  bounds 
require  l  >  k2  to  guarantee  good  accuracy,  our  numerical  experiments  (some  of  which  are 
reported  in  Section  6)  indicate  that  l  >  k  +  5  suffices.  For  l  —  k  +  5,  the  estimate  (200)  is 
as  tight  as  (201). 

5.4.3  Singular  value  decomposition  by  means  of  the  interpolative  decomposition 

The  algorithm  of  Subsection  5.3  incurs  the  following  costs  in  order  to  compute  an  approxi¬ 
mation  to  a  singular  value  decomposition  of  A,  in  addition  to  (198): 

1.  Constructing  L  and  Q  in  (192)  costs  0(k2n). 

2.  Computing  C  in  (193)  costs  0{k2m). 

3.  Computing  the  SVD  of  C  in  (194)  costs  0(k2  m). 
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4.  Computing  V  in  (195)  costs  0(k2n). 

The  verification  scheme  of  Subsection  3.4  requires  applying  A,  U,  E,  and  V*  to  a  fixed 
number  (say  6)  of  vectors,  at  costs  of  0(mn),  0(km),  0(k),  and  0(kn).  Summing  up  the 
costs  for  Steps  1-4  above  and  for  the  verification  scheme,  plus  (198),  we  conclude  that  the 
algorithm  of  Subsection  5.3  costs 

Csvd(id)  =  0{mn  log(/)  +  Ikn  log(n)  +  k2  m).  (202) 

Remark  5.6  As  in  Remark  5.4,  when  “Q  R”  decompositions  are  used  as  in  [5]  to  compute 
the  matrices  Z  and  P  in  (168),  the  cost  of  the  algorithm  of  Subsection  5.3  is  usually  less 
than  the  cost  of  the  algorithm  of  Subsection  5.2,  typically 

CgvD(iD)  =  0(mn  log(/)  +  Ikn  +  k 2  m ).  (203) 


6  Numerical  examples 

In  this  section,  we  describe  the  results  of  several  numerical  tests  of  the  algorithm  of  the 
present  paper.  Tables  1-6  summarize  the  numerical  output  of  applying  the  algorithm  to  the 
matrix  A  defined  below  for  each  of  the  examples. 

In  all  of  the  tables,  we  set  l  =  k  +  8  for  the  user-specified  parameter  /,  where  k  is  the 
rank  of  the  approximations  constructed  by  the  algorithms.  As  described  below,  many  of  the 
entries  in  the  tables  report  the  worst  or  average  results  over  multiple  trials  of  the  randomized 
algorithms.  We  conducted  30  randomized  trials  for  Tables  1  and  4,  100  randomized  trials 
for  Tables  2  and  5,  and  500  randomized  trials  for  Tables  3  and  6.  For  the  verification  scheme 
of  Subsection  3.4,  we  ran  6  independent  tests  of  each  randomized  approximation  matrix, 
exactly  as  described  in  Subsection  3.4. 

Tables  1-3  display  the  results  of  applying  the  interpolative  decomposition  algorithm  of 
Subsection  5.1  once  to  the  matrix  A  defined  below  for  each  example.  Tables  4-6  display 
the  results  of  applying  the  singular  value  decomposition  algorithm  of  Subsection  5.3.  The 
numerical  experiments  reported  in  [19]  indicate  that  the  algorithm  of  Subsection  5.2  is  not 
competitive  with  the  algorithm  of  Subsection  5.3  in  terms  of  either  accuracy  or  efficiency. 

In  all  of  the  tables,  k  is  the  rank  of  the  approximations  constructed  by  the  algorithms, 
and  <Tfc+i  is  the  (k  +  l)st  greatest  singular  value  of  A;  oy.+1  is  also  the  spectral  norm  of  the 
difference  between  the  original  matrix  A  and  its  best  rank- A;  approximation. 

In  Tables  1-3,  direct  is  the  spectral  norm  of  the  difference  between  the  original  matrix 
A  and  the  approximation  B  P  to  an  interpolative  decomposition  obtained  via  the  pivoted 
“Q  R”  decomposition  algorithm  of  [5]  that  is  based  upon  plane  (Householder)  reflections.  In 
Tables  4-6,  direct  is  the  spectral  norm  of  the  difference  between  the  original  matrix  A  and  the 
approximation  U  EH*  to  an  SVD  obtained  via  following  up  a  pivoted  “Q  R”  decomposition 
algorithm  based  upon  plane  (Householder)  reflections  with  a  call  to  the  LAPACK  3.1.1 
divide-and-conquer  SVD  routine  dgesdd. 

In  Tables  1-3,  5fast  is  the  maximum  over  multiple  randomized  trials  of  the  spectral  norm  of 
the  difference  between  the  original  matrix  A  and  the  approximation  B  P  to  an  interpolative 
decomposition  obtained  via  the  randomized  algorithm.  In  Tables  4-6,  <5fast  is  the  maximum 
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over  multiple  randomized  trials  of  the  spectral  norm  of  the  difference  between  the  original 
matrix  A  and  the  approximation  U  E  V*  to  an  SVD  obtained  via  the  randomized  algorithm. 

In  Tables  1-3,  <5fast/£est  is  the  maximum  over  multiple  randomized  trials  of  the  factor  by 
which  the  randomized  verification  scheme  underestimated  the  spectral  norm  of  the  difference 
between  A  and  the  approximation  BP  to  an  interpolative  decomposition  obtained  via  the 
randomized  algorithm.  In  Tables  4-6,  <5fast/t>est  is  the  maximum  over  multiple  randomized 
trials  of  the  factor  by  which  the  randomized  verification  scheme  underestimated  the  spectral 
norm  of  the  difference  between  A  and  the  approximation  U  E  V*  to  an  SVD  obtained  via 
the  randomized  algorithm. 

In  Tables  1-3,  f direct  is  the  number  of  seconds  of  CPU  time  taken  by  the  pivoted  “Q  R” 
decomposition  algorithm  of  [5]  that  is  based  upon  plane  (Householder)  reflections.  In  Ta¬ 
bles  4-6,  t direct  is  the  number  of  seconds  of  CPU  time  taken  by  the  combination  of  a  pivoted 
“Q  R”  decomposition  algorithm  based  upon  plane  (Householder)  reflections  and  the  LA- 
PACK  3.1.1  divide-and-conquer  SVD  routine  dgesdd. 

In  all  of  the  tables,  tfast  is  the  average  over  multiple  randomized  trials  of  the  number  of 
seconds  of  CPU  time  taken  by  the  randomized  algorithm  plus  the  number  of  seconds  taken  by 
the  verification  scheme;  every  approximation  produced  by  the  randomized  algorithm  passed 
the  verification  test  during  our  experiments  (as  well  as  during  all  of  our  experiments  with 
l  >  k  +  5). 

The  values  of  direct  and  <5fast  displayed  in  the  tables  are  those  obtained  via  the  power 
method  for  estimating  the  spectral  norm  of  a  matrix. 

Tables  1  and  4  report  the  results  of  applying  the  algorithms  of  Subsections  5.1  and  5.3 
to  the  4,096  x  4,096  matrix  defined  via  the  formula 


1+2 

A  =  J2u(k)  °k(vwy, 

k=  1 


where 


for  k  —  1,  2,  . . 


=  10  — 120-([fc— 1]  mod  10)/(J+1) 

+  1,  l  +  2, 


v(k)\  .  —  _  2nijk/4096 

h  v/4096 


(204) 


(205) 


(206) 
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for  j  —  1,  2,  ,  4,095,  4,096  and  k  —  1,  2,  . . . ,  l  +  1,  l  +  2,  and 

110), 


«(1))T  = 

1 

(  1 

1  ...  1 

\Jn  —  1 

m(2))t  = 

(  0  0 

001), 

V3))T  = 

1 

(  1 

-1  1  -1 

pn  —  2 

m(4))t  = 

0 

1 

O 

O 

u(5))T  = 

7l(° 

0 

0010 

u(6))T  = 

0 

0000 

1-11-10  0), 


(207) 

(208) 

(209) 

(210) 
(211) 

oo),  (212) 

(213) 

and  so  on.  More  precisely,  for  k  =  4,  5,  ...,/  +  1,  /  +  2,  the  (4k  —  15)th  and  (4k  —  13)th 
entries  of  are  ^  and  — ^ ,  and  all  other  entries  of  v!k)  are  zero.  Obviously,  oy,  oy,  . . . , 
ct;+i,  cr ;_|_2  are  the  nonzero  singular  values  of  A.  We  note  that  ||^4||  —  a i  =  1. 

Tables  2  and  5  report  the  results  of  applying  the  algorithms  of  Subsections  5.1  and  5.3  to 
the  2,048  x  2,048  matrix  A  effecting  convolution  with  the  complex  2,048  x  1  column  vector 
7  with  the  entry 


7 j  = 


2048 


2048 

£• 

k= 1 


,-27ri0'-l)(fc-l)/2048 


(214) 


for  j  =  1,  2,  ... ,  2,047,  2,048,  where 


ak  =  io_24'hfc_1l  mod  2)/(*+1)  (215) 

for  k  —  1,  2,  ...,/  +  1,  /  +  2,  and 

°k  =  0  (216) 

otherwise  (for  k  =  l  +  3,  1  +  4,  ...,  2,047,  2,048).  Combining  the  facts  that  A  effects 
convolution  with  7,  and  that  7  is  a  discrete  Fourier  transform  of  oy,  <j2,  . . . ,  07047,  02048, 
yields  that  oy,  oy,  . . . ,  07047,  07043  are  the  singular  values  of  A.  We  note  that  ||7L||  =  oy  =  1. 

Tables  3  and  6  report  the  results  of  applying  the  algorithms  of  Subsections  5.1  and  5.3 
to  the  1,024  x  1,024  matrix  A  defined  via  the  formula 

1+2 

A  =  ^u(k)ak(v(k))*,  (217) 

k=l 

where 

ak  =  10-i2-(fc-i)/0+i)  (218) 

for  k  —  1,  2,  ...,/  +  1,  /  +  2,  and  #,  u <y2\  . . . ,  n^+1),  u (,+2^  and  v^,  v^2\  . . . , 

are  two  independent  sets  of  orthonormal  vectors  obtained  by  applying  the  Gram-Schmidt 

process  to  vectors  whose  entries  are  drawn  i.i.d.  from  a  pseudorandom  number  generator 
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with  a  complex  Gaussian  distribution  of  zero  mean  and  unit  variance.  Obviously,  cy,  02, 
. . . ,  ai+ 1,  ai+ 2  are  the  nonzero  singular  values  of  A.  We  note  that  ||A||  =  G\  —  1. 

We  performed  all  computations  using  IEEE  standard  double-precision  variables,  whose 
mantissas  have  approximately  one  bit  of  precision  less  than  16  digits  (so  that  the  relative 
precision  of  the  variables  is  approximately  .2E-15).  We  ran  all  computations  on  one  core  of  a 
1.86  GHz  Intel  Centrino  Core  Duo  microprocessor  with  2  MB  of  L2  cache  and  1  GB  of  RAM. 
We  compiled  the  Fortran  77  code  using  the  Lahey/Fujitsu  Express  v6.2  compiler,  with  the 
optimization  flag  — o2  enabled.  We  used  a  double-precision  version  of  P.  N.  Swarztrauber’s 
FFTPACK  library  for  the  fast  Fourier  transforms  required  by  Step  1  in  the  algorithm  of 
Subsection  3.3. 

Remark  6.1  Tables  1-6  indicate  that  the  algorithms  of  Subsections  5.1  and  5.3  are  generally 
more  efficient  than  the  classical  pivoted  “Q  R”  decomposition  algorithm  based  on  plane 
(Householder)  reflections,  followed  by  either  the  algorithm  of  [5]  or  the  LAPACK  3.1.1  divide- 
and-conquer  SVD  routine. 

Remark  6.2  The  numerical  experiments  reported  in  [19]  indicate  that  the  algorithm  of 
Subsection  5.2  is  not  competitive  with  the  algorithm  of  Subsection  5.3  in  terms  of  either 
accuracy  or  efficiency.  However,  the  algorithm  of  Subsection  5.2  is  of  theoretical  interest. 

Remark  6.3  The  entries  in  the  tables  for  <5fast/$est  arc  all  less  than  8y/n,  in  accord  with  (38) 
for  6  verification  trials,  where  A  is  n  x  n  (in  fact,  the  values  are  all  less  than  y/n). 

7  Conclusions  and  generalizations 

This  paper  provides  an  algorithm  for  the  low-rank  approximation  of  arbitrary  matrices. 
Given  the  entries  of  a  matrix  A,  the  algorithm  provides  a  means  for  computing  several  of 
the  greatest  singular  values  and  corresponding  singular  vectors  of  A;  it  is  generally  faster 
than  existing  schemes,  can  access  each  column  of  A  independently  and  at  most  twice,  and 
parallelizes  easily. 

The  theoretical  bounds  derived  in  the  present  paper  should  be  considered  preliminary. 
Our  numerical  experiments  indicate  that  the  algorithm  performs  better  than  our  estimates 
guarantee.  Comfortingly,  the  verification  scheme  of  Subsection  3.4  provides  an  inexpensive 
means  for  determining  the  precision  of  the  approximation  obtained  during  every  run.  If 
the  algorithm  were  to  produce  an  approximation  that  were  less  accurate  than  desired,  then 
one  could  run  the  algorithm  again  with  an  independent  realization  of  the  random  variables 
involved,  in  effect  boosting  the  probability  of  success  at  a  reasonable  additional  expected 
cost. 

Nevertheless,  the  randomized  algorithm  produced  an  approximation  accurate  to  within 
3  digits  of  the  best  possible  during  every  trial  reported  in  the  numerical  experiments  of 
Section  6,  obviating  the  need  to  run  the  algorithm  again  (assuming  that  k  was  chosen 
sufficiently  large  that  the  accuracy  of  the  best  possible  rank -k  approximation  was  sufficiently 
high).  We  are  currently  investigating  our  empirical  observation  that  the  algorithm  produces 
a  nearly  optimal  approximation  whenever  the  user- specified  parameter  l  is  only  very  slightly 


37 


greater  than  the  rank  k  of  the  approximation.  Our  current  theoretical  bounds  require  l  >  k 2 
in  order  to  ensure  good  accuracy. 

The  algorithm  of  the  present  article  admits  several  generalizations  along  the  lines  dis¬ 
cussed  in  [13],  namely: 

1.  If  the  singular  values  of  the  matrix  being  approximated  decay  sufficiently  fast,  then 
the  factors  of  yffir  in  (166)  and  (191),  and  of  max{m,  n}  in  (173),  would  appear  to 
be  superfluous,  both  in  theory  and  in  practice. 

2.  In  the  present  article,  the  rank  k  of  the  approximation  to  be  constructed  and  the  user- 
specified  parameter  l  are  fixed.  In  practice,  one  adjusts  k  and  l  during  the  course  of  the 
algorithm  in  order  to  guarantee  that  the  approximation  attains  a  prescribed  accuracy, 
preferably  using  as  small  a  number  l  as  possible. 

3.  The  present  article  constructs  approximations  to  interpolative  decompositions  and  to 
singular  value  decompositions.  We  have  constructed  a  similar  algorithm  for  approxi¬ 
mating  the  Schur  decomposition  (see,  for  example,  Theorem  7.1.3  in  [8]  for  a  description 
of  the  Schur  decomposition). 

4.  The  present  paper  uses  complex  arithmetic.  When  processing  real  matrices,  one  should 
use  only  real  arithmetic. 
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k 

l 

°7+ 1 

^direct 

^fast 

^fast/  <^est 

^direct 

^fast 

^direct  /^fast 

8 

16 

.100E1 

•  100E1 

.177E1 

41.7 

.29E1 

.17E1 

1.7 

24 

32 

.534E-7 

.755E-7 

.364E-6 

41.2 

.79E1 

.19E1 

4.1 

56 

64 

.588E-9 

.578E-7 

.997E-8 

34.5 

.18E2 

.22E1 

8.2 

120 

128 

.687E-11 

.178E-7 

.514E-9 

39.0 

.37E2 

.29E1 

13 

248 

256 

.622E-11 

.561E-6 

.407E-9 

46.2 

.76E2 

.60E1 

13 

504 

512 

.201E-11 

.162E-6 

.339E-9 

29.9 

.16E3 

.28E2 

5.7 

1016 

1024 

.150E-11 

.651E-6 

.285E-9 

34.1 

.32E3 

.12E3 

2.7 

Table  1:  ID  of  the  4,096  x  4,096  matrix  A  defined  in  (204) 


k 

/ 

&k+ 1 

^direct 

$fast 

^fast/  $est 

^direct 

^fast 

^direct/  ^fast 

8 

16 

.225E-5 

.319E-5 

.823E-5 

5.06 

.72E0 

.42E0 

1.7 

24 

32 

.187E-8 

.  148E-7 

.184E-7 

7.58 

.20E1 

.48E0 

4.1 

56 

64 

.459E-10 

.119E-7 

.793E-9 

9.59 

.44E1 

.58E0 

7.6 

120 

128 

.687E-11 

.569E-7 

.118E-9 

12.7 

.92E1 

.96E0 

9.6 

248 

256 

.263E-11 

.181E-8 

.774E-10 

10.8 

.18E2 

.24E1 

7.4 

504 

512 

.162E-11 

.981E-11 

.759E-10 

11.1 

.36E2 

.13E2 

2.8 

1016 

1024 

.127E-11 

.851E-11 

.723E-10 

11.9 

.72E2 

.46E2 

1.6 

Table  2:  ID  of  the  2,048  x  2,048  matrix  A  effecting  convolution  with  7  defined  in  (214) 


k 

/ 

07+1 

^direct 

Ofast 

^fast/  ^est 

^direct 

^fast 

^direct /^fast 

8 

16 

.225E-5 

.342E-5 

.100E-4 

14.2 

.17E0 

.10E0 

1.7 

24 

32 

.187E-8 

.383E-8 

.163E-7 

18.1 

.47E0 

.12E0 

3.9 

56 

64 

.459E-10 

.127E-9 

.819E-9 

14.5 

.11E1 

.16E0 

6.5 

120 

128 

.687E-11 

.246E-10 

.213E-9 

19.7 

.21E1 

.32E0 

6.7 

248 

256 

.263E-11 

.133E-10 

.119E-9 

14.9 

.46E1 

.97E0 

4.7 

504 

512 

.162E-11 

.956E-11 

.117E-9 

16.8 

.86E1 

.48E1 

1.8 

Table  3:  ID  of  the  1,024  x  1,024  matrix  A  defined  in  (217) 
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k 

l 

°7+ 1 

^direct 

^fast 

^fast/  <^est 

^direct 

^fast 

^direct  /^fast 

8 

16 

.100E1 

•  100E1 

.177E1 

41.7 

.29E1 

•  17E1 

1.7 

24 

32 

.534E-7 

.755E-7 

.364E-6 

41.2 

.80E1 

.20E1 

4.1 

56 

64 

.588E-9 

.575E-7 

.997E-8 

34.5 

.19E2 

.28E1 

6.7 

120 

128 

.687E-11 

.682E-8 

.514E-9 

39.0 

.41E2 

.59E1 

6.9 

248 

256 

.622E-11 

.127E-7 

.407E-9 

46.2 

.87E2 

.19E2 

4.7 

504 

512 

.201E-11 

.150E-7 

.339E-9 

29.9 

.19E3 

.79E2 

2.4 

1016 

1024 

.150E-11 

.689E-8 

.285E-9 

34.1 

.46E3 

.35E3 

1.3 

Table  4:  SVD  of  the  4,096  x  4,096  matrix  A  defined  in  (204) 


k 

/ 

&k+ 1 

^direct 

$fast 

^fast/  $est 

^direct 

^fast 

^direct/  ^fast 

8 

16 

.225E-5 

.319E-5 

.823E-5 

5.06 

.73E0 

.41E0 

1.8 

24 

32 

.187E-8 

.  148E-7 

.184E-7 

7.58 

.20E1 

.52E0 

3.9 

56 

64 

.459E-10 

.956E-8 

.793E-9 

9.59 

.47E1 

.85E0 

5.5 

120 

128 

.687E-11 

.273E-7 

.178E-9 

12.7 

.11E2 

.23E1 

4.6 

248 

256 

.263E-11 

.159E-8 

.774E-10 

10.8 

.24E2 

.86E1 

2.8 

504 

512 

.162E-11 

.981E-11 

.759E-10 

11.1 

.58E2 

.42E2 

1.4 

1016 

1024 

.127E-11 

.851E-11 

.723E-10 

11.9 

.16E3 

.18E3 

0.9 

Table  5:  SVD  of  the  2,048  x  2,048  matrix  A  effecting  convolution  with  7  defined  in  (214) 


k 

/ 

07+1 

^direct 

Ofast 

^fast/  ^est 

^direct 

^fast 

^direct /^fast 

8 

16 

.225E-5 

.342E-5 

.100E-4 

14.2 

.18E0 

.11E0 

1.6 

24 

32 

.187E-8 

.383E-8 

.163E-7 

18.1 

.51E0 

.15E0 

3.4 

56 

64 

.459E-10 

.127E-9 

.819E-9 

14.5 

.12E1 

.30E0 

3.9 

120 

128 

.687E-11 

.246E-10 

.213E-9 

19.7 

.27E1 

.90E0 

3.0 

248 

256 

.263E-11 

.133E-10 

.119E-9 

14.9 

.68E1 

.41E1 

1.7 

504 

512 

.162E-11 

.956E-11 

.117E-9 

16.8 

.20E2 

.20E2 

1.0 

Table  6:  SVD  of  the  1,024  x  1,024  matrix  A  defined  in  (217) 
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